Closed ssnn-airr closed 4 years ago
Original comment by Julian Zhou (Bitbucket: jqz, GitHub: julianqz).
Oh, there’s actually already an issue (#65) for this! I created this after bringing it up at subgroup meeting and Steve said that we should make a note of it even if it’s hard to reproduce.
I haven’t tried using it with usearch
.
As get-around, I’ve been breaking my input files into chunks and passing them individually to AssemblePairs
. It still hangs sometimes; or there could be a core dump and when I compare the input # reads vs. output # reads (passed + failed), a discrepancy appears. But this way at least I only have to re-run for one small chunk, as opposed to having to wait for the entire lot to run through again.
Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).
Does it get stuck if you use usearch?
Also, the native SW in python is really slow. There are some C coded Striped Smith-Waterman libraries for Python out there, so that would be a better approach. Last I checked, I couldn’t get them to install, but that was years ago.
The blastn/usearch wrapper setup is pretty inefficient right now, having bigger chunks of sequences (maybe even just 1 chunk) passed into blastn/usearch for reference alignment might help.
Original report by Julian Zhou (Bitbucket: jqz, GitHub: julianqz).
No way to reproduce this (it doesn’t happen every time, but does happen ~9 out of 10 times in my experience [on Farnam]), but
AssemblePairs.py sequential --aligner blastn
tends to get stuck before finishing, anywhere between 5% to 95%. @{5a0336c6c24b5074212438b7} suggested that it might be a file system issue withblastn
, with a potential alternative being trying to replaceblastn
by something like a Smith-Waterman algorithm implemented in native Python.