satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.27k stars 910 forks source link

read 10x data from cellranger count(version 3.1.0) and CreateSeuratObject #7666

Closed yulchen810 closed 1 year ago

yulchen810 commented 1 year ago

Hi, Dear team: I have downloaded fastq files from NCBI, which can be only processed by old cellranger version because new version cannot auto_detect chemistry. Here I used CellRanger count (version 3.1.0), and I got a folder filtered_feature_bc_matrix which contains three files: barcodes.tsv.gz, features.tsv.gz, matrix.mtx.gz But when I try to use Read10X and CreateSeuratObject function in r, it generates empty seurat object. I apply the same code to other filtered_feature_bc_matrix folder generated by CellRanger count (version 7.1.0) , it works well. Can you give me some suggestions? I'm thinking may be I use wrong function for old version result.

Thank you a lot. Best

samuel-marsh commented 1 year ago

Hi,

Not member of dev team but hopefully can be helpful. When you run Read10X on the NCBI data do you get a proper dCgMatrix in R?

Could you also provide NCBI GEO accession number in case it is specific to these files?

Best, Sam

yulchen810 commented 1 year ago

Hi,

Not member of dev team but hopefully can be helpful. When you run Read10X on the NCBI data do you get a proper dCgMatrix in R?

Could you also provide NCBI GEO accession number in case it is specific to these files?

Best, Sam

Hi, Sam I can't get a proper dCgMatrix in R, it's completely empty. I opened the 3 files (barcodes.tsv.gz, features.tsv.gz, matrix.mtx.gz) in Notepad, it seems normal.

The NCBI GEO accession number is GSE140510, I plan to use the 6 files(CellRanger count version3.1.0 : SRR10480618, SRR10480620, SRR10480624, SRR10480626; CellRanger count version2.1.1 : SRR10480619; SRR10480624 can not be processed by Cellranger).

samuel-marsh commented 1 year ago

Hi,

So after re-reading your issue this sounds like it's not really issue with Seurat but more likely some issue with the running of Cell Ranger (hard to know without re-running the analysis myself). Although if the files look ok when opened separately you can always try to read them in to R individually, add the barcodes and features as dimnames to the matrix and then use that to create Seurat object.

But I will add I don't see any reason why you shouldn't use Cell Ranger 7.1.0 instead of 3.1.0 or 2.1.1 (definitely wouldn't use V3 & V2 cell ranger in same analysis because of major changes in cell calling between those versions). Cell Ranger 7.1.0 should be perfectly capable of re-analyzing 5' V1 data (I've done it on my own data). All of the samples should be fine.

However, in looking the fastq files it appears they used a strange run length configuration (R1: 125/R2: 151) therefore the reason that Cell Ranger is likely having trouble is because of uneven read lengths of non-standard configuration. However, you should be able to solve this by supplying some of the optional flags in Cell Ranger count. I think should work to set to trim read lengths to be equal and set chemistry flag to paired end mode.

cellranger count ... --chemistry=SC5P-PE --r1-length=125 --r2-length

Best, Sam

yulchen810 commented 1 year ago

Hi,

So after re-reading your issue this sounds like it's not really issue with Seurat but more likely some issue with the running of Cell Ranger (hard to know without re-running the analysis myself). Although if the files look ok when opened separately you can always try to read them in to R individually, add the barcodes and features as dimnames to the matrix and then use that to create Seurat object.

But I will add I don't see any reason why you shouldn't use Cell Ranger 7.1.0 instead of 3.1.0 or 2.1.1 (definitely wouldn't use V3 & V2 cell ranger in same analysis because of major changes in cell calling between those versions). Cell Ranger 7.1.0 should be perfectly capable of re-analyzing 5' V1 data (I've done it on my own data). All of the samples should be fine.

However, in looking the fastq files it appears they used a strange run length configuration (R1: 125/R2: 151) therefore the reason that Cell Ranger is likely having trouble is because of uneven read lengths of non-standard configuration. However, you should be able to solve this by supplying some of the optional flags in Cell Ranger count. I think should work to set to trim read lengths to be equal and set chemistry flag to paired end mode.

cellranger count ... --chemistry=SC5P-PE --r1-length=125 --r2-length

Best, Sam Hi, Sam The reason why I don't use Cell Ranger 7.1.0 is that it failed to detect the chemistry. Actually I tried to use SC5P-PE manually, it still didn't work. Thank you anyway. Best, Xiaoyan

yulchen810 commented 1 year ago

Hi,

So after re-reading your issue this sounds like it's not really issue with Seurat but more likely some issue with the running of Cell Ranger (hard to know without re-running the analysis myself). Although if the files look ok when opened separately you can always try to read them in to R individually, add the barcodes and features as dimnames to the matrix and then use that to create Seurat object.

But I will add I don't see any reason why you shouldn't use Cell Ranger 7.1.0 instead of 3.1.0 or 2.1.1 (definitely wouldn't use V3 & V2 cell ranger in same analysis because of major changes in cell calling between those versions). Cell Ranger 7.1.0 should be perfectly capable of re-analyzing 5' V1 data (I've done it on my own data). All of the samples should be fine.

However, in looking the fastq files it appears they used a strange run length configuration (R1: 125/R2: 151) therefore the reason that Cell Ranger is likely having trouble is because of uneven read lengths of non-standard configuration. However, you should be able to solve this by supplying some of the optional flags in Cell Ranger count. I think should work to set to trim read lengths to be equal and set chemistry flag to paired end mode.

cellranger count ... --chemistry=SC5P-PE --r1-length=125 --r2-length

Best, Sam

By the way, here is part of the log.

[error] Pipestance failed. Error log at: SRR10480619/SC_RNA_COUNTER_CS/SC_MULTI_CORE/MULTI_CHEMISTRY_DETECTOR/_GEM_WELL_CHEMISTRY_DETECTOR/DETECT_COUNT_CHEMISTRY/fork0/chnk0-ua96ad3bd27/_errors

Log message: You selected chemistry SC5P-PE, which expects the cell barcode sequence in read1. In the input data, an extremely low rate of correct barcodes was observed for this chemistry (0.0%). Please check your input data and chemistry selection. Note: manual chemistry detection is not required in most cases. Input: Sample SRR10480619 in "/SRA_download/GSE140510_fastqs"

Waiting 6 seconds for UI to do final refresh. Pipestance failed. Use --noexit option to keep UI running after failure.

samuel-marsh commented 1 year ago

Hmmmm something seems definitely wrong with that file.

If you set the R1 and R2 lengths to trim to same length does that help? Or try trimming to 26bp for R1 and running with --chemistry= fiveprime?

I'd be skeptical if Cell Ranger 7.1.0 has issues that there isn't issue with the files generated by older cell ranger versions. If neither of those read trimming options work I'd suggest emailing authors and sending them copies of the errors and ask for clarification on read scheme they used (because it's not listed in paper) and if they could check GEO fastqs aren't corrupted or if they could provide you with original fastqs.

Best, Sam

yulchen810 commented 1 year ago

Hmmmm something seems definitely wrong with that file.

If you set the R1 and R2 lengths to trim to same length does that help? Or try trimming to 26bp for R1 and running with --chemistry= fiveprime?

I'd be skeptical if Cell Ranger 7.1.0 has issues that there isn't issue with the files generated by older cell ranger versions. If neither of those read trimming options work I'd suggest emailing authors and sending them copies of the errors and ask for clarification on read scheme they used (because it's not listed in paper) and if they could check GEO fastqs aren't corrupted or if they could provide you with original fastqs.

Best, Sam

Neither --r1-length=125 --r2-length=125 nor --chemistry=fiveprime --r1-length=26 work, the latter log info: SRR10480618/SC_RNA_COUNTER_CS/SC_MULTI_CORE/MULTI_CHEMISTRY_DETECTOR/_GEM_WELL_CHEMISTRY_DETECTOR/DETECT_COUNT_CHEMISTRY/fork0/chnk0-u1f64d3f2ea/_errors Log message: You selected chemistry SC5P-R2, which expects the cell barcode sequence in read1. In the input data, an extremely low rate of correct barcodes was observed for this chemistry (0.0%). Please check your input data and chemistry selection. Note: manual chemistry detection is not required in most cases.

Actually, for all of the 6 files, the log info is always can not find chemistry when using CellRanger 7.1 or 6.0.1

Yes, I may need to write to author and see what's going on. Thank you