Open grst opened 5 years ago
For the first question, 10x provides a whitelist of all possible cell barcodes. We can use those if you want. meaning we extract all 737k cells and then filter.
For the first question, 10x provides a whitelist of all possible cell barcodes. We can use those if you want. meaning we extract all 737k cells and then filter.
That's basically what I do now for the lambrechts data from cellranger. It seems to work quite well. But how would that look like for the other protocols? Or would it work if I just specify a large, arbitrary number, say 200,000?
- Exactly the same samples with different read lengths? That is odd. I would keep them apart and combine them at the very end based on the cell barcode.
They have the same GSM identifier on GEO, but multiple SRR identifiers on SRA. I think they contain different cells. -> keeping them seperate defintely makes sense.
They actually have different read lengths within the same file. No idea how they got there. It's all Illumina HiSeq 2500.
>zless SRRXXXXXX_2.fastq.gz
What will STAR do when it does not have an index for a certain read length? Actually, the authors use STAR, too, but I couldn't find out so far how they generate the index.
Hi @Hoohm,
while creating the sample.csv files, I came up with a couple of questions:
Is the n_cels parameter in samples.csv mandatory? I plan to filter downstram with scanpy, and cannot find the 'expected cells' per library in the description of every dataset.
Read length only refers to R1 read length, right?
Some samples consist of multiple runs with different read lengths each (:roll_eyes:). How to deal with them? Suggestion: treat them as individual samples
Which brings me to the next question: does it make any difference if I (a) concatenate two fastq files and treat them as a single sample or (b) leave them as they are and treat them as two samples.