yjzhang / split-seq-pipeline

MIT License
19 stars 21 forks source link

Format conversion recommendation? #17

Open hepcat72 opened 4 years ago

hepcat72 commented 4 years ago

This is my first foray into single cell. I'm working in galaxy and I'd like to run seurat on data from your paper to develop a pipeline, but it appears that the galaxy wrapper takes a tsv instead of a mtx file. I've written a quick conversion in perl for testing purposes, but is there an established tool for doing the conversion?

yjzhang commented 4 years ago

We do not have an "official" tool for doing the mtx to tsv conversion; you can use whatever you'd like.

hepcat72 commented 4 years ago

So I've been learning a few things in the conversion.

  1. The underscores at the end of the cell barcodes cause cells with the same 16nts to be grouped together by Seurat's CellsByIdentities method
  2. Gene/row names cannot have underscores, so including the species/chromosome from genes.csv is problematic
  3. Joining multiple values from gene.csv with commas as row names is problematic in some steps in Seurat which do not allow commas
  4. For various QC steps, row names at least have to indicate when a gene is from the mitochondrial chromosome

And I'm not entirely certain that gene symbols in the tsv is appropriate and I'm not entirely sure that allowing genes (and cells) from multiple species is supported/correct when using it in seurat.

So again, some guidance on conversion would be helpful.