grst / single_cell_data_integration

1 stars 0 forks source link

Raw data (FASTQ) alignment & quantification #8

Open grst opened 5 years ago

grst commented 5 years ago

Status

Dataset platform fastq downloaded samples.csv fastq renamed/ concatenated config.yml pipeline success
zheng_zhang_2017 smartseq2 wait for EGA
zheng_bileas_2017 10x_3p_v2
guo_zhang_2018 smartseq2 wait for EGA
savas_loi_2018 10x_3p_v2
azizi_peer_2018 indrop_v2
azizi_peer_2018_10x 10x_5p
lambrechts_2018_6149_v1 10x_3p_v1
lambrechts_2018_6149_v2 10x_3p_v2
lambrechts_2018_6653 10x_3p_v2

Platforms

Strategies

Azizi/Peer (2018) suggest using a reduced GTF file, only containing protein coding, transcribed pseudogenes and linc genes. Other features cannot be determined using dropSeq platforms and reduce the amount of multi-mapped reads. Actually, cell-ranger goes one step further and only aligns to protein-coding genes. https://www.cell.com/cell/fulltext/S0092-8674(18)30723-2?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0092867418307232%3Fshowall%3Dtrue#sectitle0185

@Hoohm, does dropSeqPipe/tools do any filtering on GTF and do you think it would make sense to implement something like this.

Hoohm commented 5 years ago

Interesting. I have to read it to tell you what I think of it.

grst commented 5 years ago

@Hoohm, could you please put the preprocessing script you used for lambrechts_2018_6149_v1 in the repo (and/or the fastq files processed to run with dropseqpipe).

Hoohm commented 5 years ago

@grst done

grst commented 5 years ago

Actually, most of the datasets should be ready to run. I prepared the fastq files and created a samples.csv @Hoohm, can we have a look together at the config.yml's? When do you have time?

Hoohm commented 5 years ago

Sure, tomorrow?

I'm almost done with the restructuring of results and such. I also added the possibility to exclude biotypes from the annotation.