grst / single_cell_data_integration

1 stars 0 forks source link

dropSeqPipe configuration #11

Open grst opened 5 years ago

grst commented 5 years ago

Hi @Hoohm,

while creating the sample.csv files, I came up with a couple of questions:

Hoohm commented 5 years ago
Hoohm commented 5 years ago

For the first question, 10x provides a whitelist of all possible cell barcodes. We can use those if you want. meaning we extract all 737k cells and then filter.

grst commented 5 years ago

For the first question, 10x provides a whitelist of all possible cell barcodes. We can use those if you want. meaning we extract all 737k cells and then filter.

That's basically what I do now for the lambrechts data from cellranger. It seems to work quite well. But how would that look like for the other protocols? Or would it work if I just specify a large, arbitrary number, say 200,000?

  • Exactly the same samples with different read lengths? That is odd. I would keep them apart and combine them at the very end based on the cell barcode.

They have the same GSM identifier on GEO, but multiple SRR identifiers on SRA. I think they contain different cells. -> keeping them seperate defintely makes sense.

grst commented 5 years ago

They actually have different read lengths within the same file. No idea how they got there. It's all Illumina HiSeq 2500. >zless SRRXXXXXX_2.fastq.gz 2018-11-26_15 17 18_782x282

What will STAR do when it does not have an index for a certain read length? Actually, the authors use STAR, too, but I couldn't find out so far how they generate the index.