liberjul / CONSTAXv2

MIT License
8 stars 2 forks source link

Blast gets stuck when 0 hits found #4

Closed andnischneider closed 2 years ago

andnischneider commented 2 years ago

Hi again

I started another issue a few months ago and just wanted to say that I have used constax successfully on several ITS datasets in the meantime, without any further issues. Now I am trying to run some 16S data and I am running into problems. A problem that keeps happening (on two different Linux servers with two different operating systems) is that blast sometimes gets stuck when an ASV sequence gets 0 hits. I say sometimes because I have tried to isolate the sequences without hit and run a test with only 20 sequences, including the problematic ones, and it runs through fine. But as soon as I go back to run the full dataset (or even a subset of say 1000 sequences), it happens again. The job keeps running (I don't get an error message of any sort), but nothing happens, no further hits get added to the blast.out file. I am running the newest version of constax through conda. I have attached an example input file I used when the problem appeared (a subset of 5000 ASVs), for me blast gets stuck at ASV_1112 if I use this file, while it works normally if I isolate ASV_1112 together with a few others. I have also attached the blast.out file in the state it was after the job froze. Archive.zip

Best Andreas

liberjul commented 2 years ago

Thank you for reaching out. I am currently attempting to replicate the error, and will get back once the job has finished running. Just to replicate properly, what database did you use for the reference when using 16S sequences?

Julian

andnischneider commented 2 years ago

I used the SILVA database, formatted like suggested in the constax readthedocs pages, and then trained with the -t option.

liberjul commented 2 years ago

This is a weird behavior from blastn. From my testing, if 100 or more sequences preceed the ASV with 0 hits (ASV_1112) in the query file, then it gets stuck. With 10 or 0 preceeding sequences the behavior is normal. This also happens even with the newest blast version (2.12.0). I'm playing with some other parameters for the blastn command, and I may try a workaround to just restart blastn if it gets stuck until all the sequences are processed.

andnischneider commented 2 years ago

Thanks for testing and verifying this, at least I'm glad it's not just something wrong on my end. This wasn't the only sequence it happened with for me, but it was always sequences with 0 hits that made it get stuck. Let me know when you find a good workaround.

liberjul commented 2 years ago

I have tested and pushed an approach to fix this for version 2.0.15. You can download the new scripts or reinstall with conda once my new version is approved. Let me know if your issues persist.

andnischneider commented 2 years ago

This fixed it, thanks again.

liberjul commented 2 years ago

@andnischneider I reported this bug to NCBI, and they were able to recreate the issue. They said the developers would work to fix it in the next BLAST release. Julian