velocyto-team / velocyto.py

RNA velocity estimation in Python
http://velocyto.org/velocyto.py/
BSD 2-Clause "Simplified" License
160 stars 83 forks source link

Error in SAMPLEFOLDER #344

Open skoturan opened 2 years ago

skoturan commented 2 years ago

Hi, I have been trying to run velocyto run10x, but I keep getting this error for my sample folder:

Error: Invalid value for 'SAMPLEFOLDER': Directory '~rds/.../filtered_feature_bc_matrix' does not exist.

Could you please tell me the files this command requires in the 10x output folder? barcodes.tsv, matrix.tsv and features.tsv ? Thanks

hyjforesight commented 2 years ago

velocyto run10x requires the Cell Ranger outputs like below image use codes like below

velocyto run10x -@ 16 -m /mnt/d/KP/hg38_rmsk.gtf /home/hyjforesight/SMC05T/ /mnt/d/KP/refdata-gex-GRCh38-2020-A-modified/genes/genes.gtf
Sa753 commented 2 years ago

Hi hyjforesight, I have been getting this error as well. I got it only when I run on cellranger output of cell ranger multi.

I assume this is because the outs folder structure is different where the sample_alignment.bam and bam.bai will be found in a subfolder in the following path instead of being directly in outs 'sample1/outs/per_sample_outs/sample1/ '

Is that correct?. Could you please advise?. I tried to even put the file path as 'sample1/outs/per_sample_outs/sample1/' instead of sample1 only but it also didn't work. Thanks

hyjforesight commented 2 years ago

hello @Sa753 we also did multiplexed scRNA-seq. As I remember, velocyto run10x didn't work for cell ranger multi. you should use velocyto (on any technique) for individual bam image

Sa753 commented 2 years ago

Hi hyjforesight,

Thanks so much for prompt reply. I will have to run cell ranger on individual samples then do Velocyto. is correct?. Thanks

hyjforesight commented 2 years ago

hello @Sa753 if your samples were prepared by 10X cell plex kit, you will get 2 libraries after sequencing. One is oligo, the other is sequencing data (Sorry, I don't remember exactly the names). Then you should run Cell Ranger Multi for demultiplexing these sequencing data. The Cell Ranger Multi output includes folders for individual samples, where bam file is there. Use these bam files and run velocyto (on any technique) one by one. velocyto run10x is only for samples prepared by 10X v2 or v3 kit.

Sa753 commented 2 years ago

Hi hyjforesight,

Yes, cell ranger multi produces bam files for each of the components of cell ranger multi and there are separate bam file for each but it is still not accepting this bam file. There is another thing, are you sure that Velocyto run10x is only for v2 or v3 kits?. Velocyto is established long before v2 or v3 kits were made and I run it on samples prepared by v1?. Thanks

hyjforesight commented 2 years ago

hello @Sa753 try this

# sort individual bam first
samtools sort /home/hyjforesight/sorting/Apc_Tumor.bam -o /home/hyjforesight/sorting/possorted_Apc_Tumor.bam -@ 16
# use the 10X barcodes.tsv and 10X reference genome file
velocyto run -@ 16 -b /home/hyjforesight/barcodes.tsv -o /home/hyjforesight/loom/ -m /home/hyjforesight/mm10_rmsk.gtf /home/hyjforesight/possorted_Apc_Tumor.bam /home/hyjforesight/refdata-gex-mm10-2020-A/genes/genes.gtf

Velocyto run10x is also suitable for v1.

Sa753 commented 2 years ago

Hi hyjforesight,

You pointed the right thing in the error which is that it always can't find the barcodes.tsv. I will try and run this code and update you. however, I just want to point that in the cellranger multi, the bam file is not called 'possorted.bam' it is called is made into assigned.bam in the per_sample_outs/counts folder which has the filtered reads assigned to cells and unassigned.bam in the multi folder/counts that contain the raw reads. so I think I will use the path to the bam in the filtered counts not the raw.

Thanks

hyjforesight commented 2 years ago

Hello @Sa753 Please use the assigned.bam. As I remember, I renamed this bam into the format of possorted_XXX.bam, so that velocyto doesn't resort it again. And unzip the barcodes file generated by Cell Ranger Multi. Velocyto needs it to match the real cell numbers.

Sa753 commented 2 years ago

Hi hyjforesight,

Can I just clarify if the possorted_genome.bam that is produced from cellranger (not cell ranger multi) needs sorting or not?. Velocyto 10x run always sort it and the run takes around 10h and large amount of RAM but in the previous reply you said that Velocyto shouldn't resort it again?. Am I missing something here?. Also, why should I unzip the barcode file from cellranger multi when Velcyto runs on the zipped barcode files from cell ranger without the need to unzip it? Thanks

Sa753 commented 2 years ago

Hi hyjforesight,

I tried the above code and it is not working.. It can't find the barcodes.tsv file and when I added -b path to barcodes.tsv file . the error log was that it didn't understand the argument -b

again just to say, cell ranger multi doesn't produced possorted_XXX.bam.

Thanks

hyjforesight commented 2 years ago

@Sa753 Could you please show me your folder contents in per_sample_outs folder? This is the example of our Cell Ranger outputs, but I don't remember the contents in subfolder of CKP. image