velocyto-team / velocyto.py

RNA velocity estimation in Python
http://velocyto.org/velocyto.py/
BSD 2-Clause "Simplified" License
160 stars 83 forks source link

Velocyto ran successfully but main matrix is missing #398

Closed econnolly7 closed 4 months ago

econnolly7 commented 4 months ago

Hi there,

I ran velocyto on my 10x output like so:

!/bin/bash

SBATCH -J velocyto_p22069-s001_Con # Job name

SBATCH --account=gts-ggibson3-biocluster # Charge account

SBATCH --output=velocyto_%j.out # Standard output log (%j will be replaced by the job ID)

SBATCH --error=velocyto_%j.err # Standard error log (%j will be replaced by the job ID)

SBATCH --time=48:00:00 # Wall time limit

SBATCH -q inferno

SBATCH --mem=256G # Total memory limit

SBATCH --cpus-per-task=16 # Number of CPUs per task

SBATCH --partition=cpu-medium # Partition name (adjust as needed)

Load necessary modules

module load samtools module load python/3.9

Activate the virtual environment

source /storage/home/hcoda1/4/econnolly7/scratch/venv_velocyto/bin/activate

Run velocyto for 10x Genomics data

echo "Running velocyto..." velocyto run10x /storage/home/hcoda1/4/econnolly7/scratch/p22069-s001_control_2024 \ /storage/home/hcoda1/4/econnolly7/scratch/refdata-gex-GRCm39-2024-A/genes/genes.gtf

Everything ran smoothly (10+hours) and a loom file was generated. However, when I try to use the loom file with scvelo I am told that the main matrix file is missing. I'm not sure what I did wrong.

I would be forever grateful for any help or insight with this.

econnolly7 commented 4 months ago

I got it working!

For anyone who might encounter this problem- the main matrix in the loom output was missing because of the version conflicts between numpy, loompy, and velocyto. Specifically, the initial installations of these packages had dependencies that were not compatible with each other, leading to incomplete or incorrect execution of velocyto and subsequent generation of the loom file.

The versions of the software and libraries that were found to be compatible to successfully run velocyto and generate the loom output correctly were:

Python: 3.9 velocyto: 0.17.17 numpy: 1.21.6 scipy: 1.7.3 numba: 0.55.1 llvmlite: 0.38.1 loompy: 3.0.7

econnolly7 commented 4 months ago

Also, barcodes.tsv had to be gzipped (barcodes.tsv.gz) for velocyto to run for me!

The final script that worked:

!/bin/bash

SBATCH -J velocyto_PD # Job name

SBATCH --account=gts-ggibson3-biocluster # Charge account

SBATCH --output=PDvelocyto%j.out # Standard output log (%j will be replaced by the job ID)

SBATCH --error=PDvelocyto%j.err # Standard error log (%j will be replaced by the job ID)

SBATCH --time=48:00:00 # Wall time limit

SBATCH -q inferno

SBATCH --mem=100G # Total memory limit

SBATCH --cpus-per-task=16 # Number of CPUs per task

SBATCH --partition=cpu-medium # Partition name (adjust as needed)

Enable verbose output for debugging

set -x

Load necessary modules

module load samtools module load python/3.9

Activate the new virtual environment

source /storage/home/hcoda1/4/econnolly7/miniconda3/etc/profile.d/conda.sh conda activate velocyto_env

Verify that velocyto is in the PATH

which velocyto

Run velocyto for 10x Genomics data

echo "Running velocyto..." velocyto run10x /storage/home/hcoda1/4/econnolly7/scratch/p22069-s004_RT-PD_2024 \ /storage/home/hcoda1/4/econnolly7/scratch/refdata-gex-GRCm39-2024-A/genes/genes.gtf

Deactivate the virtual environment

conda deactivate