Closed y9c closed 1 year ago
Hi @yech1990,
Thank you for the report and for trying Magic-BLAST. query1
is rejected by Magic-BLAST's low complexity filter. This filter should not be applied to very short sequences. We will correct this problem in the next release. Unfortunately there is no command line option to turn this filter off.
Hi, @boratyng. Thank you very for your help. I would to know when the next version of magic blast will release? Can you sent me the nightly version for testing before hand?
BTW, most of the reads shorter than 20bp also can not be mapped into the reference. Hope the fix can also solve this problem.
@yech1990, the next release is not scheduled yet. At this point I can only tell you that it will not be before October 2021. We can send you a test binary before the release.
Thank you very much @boratyng!
It take much longer time than I expected. I wonder why magicblast won't put the source code on Github, thus the fix can be distributed to the users in time?
I apologize for the late reply. Magic-BLAST is a part of NCBI C++ toolkit, a very large code base that may be too large to migrate to git and GitHub.
Hi @boratyng, any update on this?
Hi @yech1990, unfortunately no update yet.
Hi @boratyng , still not update, correct?
Hi @yech1990, still no update on the release. Sorry. But I have another solution for you. Please, try adding -validate_seqs F
option to your magicblast run. Then magicblast will filter out only extremely low complexity sequences, like polyA tails with a mismatch. This may work for you. I apologize for not suggesting this earlier.
Hi @y9c, is -validate_seqs F
option working for you? I tired improving low complexity filtering, but it would case problems in other use cases. It looks like this option should fix your problem. Please, let me know if you are still running into problems. Thanks.
Thank you for the fix. I'll test it with new data.
Yes. validate_seqs F
can save more reads. Thank you.
Thanks you for your response. I am closing this issue. Please, reopen if you still have problems.
For reads as short as 20bp, some of them can not be mapped by
magicblast
, while some of them can.For example, Both
query 1
andquery 2
are part of the 18S rRNA sequence (perfect match). By runningmagicblast
version1.6.0
with arguments-limit_lookup false -word_size 14 -max_db_word_count 60 -reftype transcriptome -score L,-10.0,0.8 -md_tag -infmt fastq -outfmt sam
,query 1
is unaligned andquery 2
is aligned properly.