icbi-lab / NeoFuse

NeoFuse is a user-friendly pipeline for the prediction of fusion neoantigens from tumor RNA-seq data.
GNU General Public License v3.0
17 stars 9 forks source link

Should I exclude the low quality bases and adapter sequences from fastq files? #10

Closed Apprentice2 closed 3 years ago

Apprentice2 commented 3 years ago

I am trying to apply fastq files of RNA-seq (paired-end) to NeoFuse to identify neoantigens. My fastq files may contain low quality bases and adapter sequences. Should I prepare fastq files that excluded the low quality bases and adapter sequences and apply them to NeoFuse?

I would appreciate it if you could tell me.

abyssum commented 3 years ago

Hello,

There is no definitive answer to your question. STAR addresses both issues with soft-clipping, but for example, if you have known issues of read-through adapter contamination you can consider trimming the reads before running NeoFuse. You should always keep in mind that there's a trade-off between precision and sensitivity for uniquely mapped reads and while aggressive trimming might increase the precision of unique mappers, it might also affect the portion of split reads and discordant mates.

If for any reason you decide to trim first, I would suggest running 1-2 samples with and without trimming to check which result can better explain your data (for example, if there are recurrent known fusion genes for the condition/disease in the results of one of the two approaches).

Apprentice2 commented 3 years ago

Thank you for your prompt reply. It was very helpful.