Closed theo-allnutt-bioinformatics closed 4 years ago
This is a good idea, Theo.
I would probably add it to SHI7 rather than burst, as it is a read filtering / QC step rather than part of alignment (even though some tools such as BLAST integrate this into the aligner).
I'm just not sure complexity masking is a good idea for all (or even most semi-global) cases though. For instance, in end-to-end query alignment, one often wants to aggregate alignments after a run to calculate coverage of the reference genome(s). If low complexity reads were dropped, this may not be possible as gaps would be introduced in low complexity regions, and if filtering is also applied to the reference genomes, their length and class distribution would be biased as some families are naturally more complex throughout their genome than others. Also, for taxonomy assignment, they may still be of (limited) use as they can contribute to LCA and Bayesian redistribution (even at less informative broad taxonomic levels).
What was the use case you had in mind?
Thanks, Gabe
On Tue, Feb 25, 2020, 11:12 PM Theo Allnutt Bioinformatics < notifications@github.com> wrote:
Is it possible to add a low complexity filter to Burst?
Thanks,
Theo
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/knights-lab/BURST/issues/23?email_source=notifications&email_token=AB5NOBVGAJVFBMGQRSVKEDLREXT23A5CNFSM4K34RIP2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IQJUKGA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5NOBQ3SP6TL7SE2CJ4FALREXT23ANCNFSM4K34RIPQ .
Hi Gabe,
what I was thinking was for more grossly simple sequences. Despite normal QC, I still see, e.g. 150 bp of 'A's interspersed with very few other bases. Also long SSRs may cover complete 150bp reads.
Thanks,
Theo
Is it possible to add a low complexity filter to Burst?
Thanks,
Theo