nf-core / scrnaseq

A single-cell RNAseq pipeline for 10X genomics data
https://nf-co.re/scrnaseq
MIT License
214 stars 172 forks source link

MTX_CONVERSION:MTX_TO_SEURAT fails with certain feature names #385

Open ChristopherBarrington opened 1 month ago

ChristopherBarrington commented 1 month ago

Description of the bug

When using --aligner cellranger the matrix files are not read correctly by ReadMtx but are read correctly by Read10X.

Command error:
  Attaching SeuratObject
  Error: Matrix has 13931 rows but found 12861 features.
  In addition: Warning messages:
  1: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
    EOF within quoted string
  2: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
    number of items read is not a multiple of the number of columns
  Execution halted

The 12861st feature is:

FBgn0025724     beta'COP        Gene Expression

It looks like the read.table function used by ReadMtx causes the problem - only reading so far as this feature. The Read10X function uses read.delim which uses the string as-is. Maybe Read10X would be a more robust alternative, assuming all aligners output in the 10X format?

Command used and terminal output

No response

Relevant files

No response

System information

grst commented 3 weeks ago

Hi @ChristopherBarrington,

thanks for reporting! The whole conversion workflow will be reimplemented in https://github.com/nf-core/scrnaseq/pull/369 hopefully fixing also this issue.

ChristopherBarrington commented 3 weeks ago

ok thanks, in the interim I have used a custom config to ignore if an error occurs

process {
    withName: 'NFCORE_SCRNASEQ:SCRNASEQ:MTX_CONVERSION:MTX_TO_(H5AD|SEURAT)' {
        errorStrategy = 'ignore'
    }
}