zhaoyanswill / RAPSearch2

Reduced Alphabet based Protein similarity Search
40 stars 15 forks source link

RAPsearch v2.22 overwrites its own .m8 output file #14

Closed snayfach closed 9 years ago

snayfach commented 9 years ago

When searching large volumes of reads (e.g. 20 million) I've found that RAPsearch2 will periodically overwrite the m8 output file, so that when the similarity search is complete, the output file only contains results for a subset of reads in the input file.

For example, two days ago I started a job with 20M reads using the command: rapsearch -q query -d database -v 1 -b 0 -i score -z 1 -o output. I checked in on the job yesterday and the m8 file contained results for reads 1 to about 5 million. Today, when I looked at the beginning of the m8 file it did not contain the results from the previous day. Instead it contained results for reads 10,663,355 to 14,183,860.

zhaoyanswill commented 9 years ago

Thanks for promoting RAPSearch2!

I've been busy with the other project. I'll look into it ASAP!

Sincerely, Y. Z.

On 03/19/2015 01:41 PM, snayfach wrote:

When searching large volumes of reads (e.g. 20 million) I've found that RAPsearch2 will periodically overwrite the m8 output file, so that when the similarity search is complete, the output file only contains results for a subset of reads in the input file.

For example, two days ago I started a job with 20M reads using the command: rapsearch -q query -d database -v 1 -b 0 -i score -z 1 -o output. I checked in on the job yesterday and the m8 file contained results for reads 1 to about 5 million. Today, when I looked at the beginning of the m8 file it did not contain the results from the previous day. Instead it contained results for reads 10,663,355 to 14,183,860.

— Reply to this email directly or view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/14.

snayfach commented 9 years ago

I'm just following up on this issue. Have you been able to reproduce the bug? Our group is waiting for this to be fixed before running RAPsearch2 on a large series of samples.

Thanks, Stephen

zhaoyanswill commented 9 years ago

Hi Stephen,

I'm traveling right now. I'll start to look into it in the next week and hopefully fix it in one week. Thanks for the patience!

Sincerely, Yongan

On 3/27/2015 4:06 PM, snayfach wrote:

I'm just following up on this issue. Have you been able to reproduce the bug? Our group is waiting for this to be fixed before running RAPsearch2 on a large series of samples.

Thanks, Stephen

— Reply to this email directly or view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/14#issuecomment-87074188.

ctberthiaume commented 9 years ago

I've also seen this bug when using large query sets. To get around it I tried to switching to using stdout for m8 file output with -u 1, but I encountered a new bug

terminate called after throwing an instance of 'boost::archive::archive_exception'
  what():  invalid signature

My solution was to split the query fasta into smaller pieces and concatenate results at the end.

snayfach commented 9 years ago

This appears to be fixed now. Thanks, Yongan.

ghbio commented 8 years ago

Hi everyone,

I noticed the same behavior working with an input file with 5.4M sequences. I also worked it around by splitting the input file in several chunks. I am using v2.22 and it doesn't seem to be fixed to me. am I missing something?

thanks! Pablo

ctberthiaume commented 8 years ago

No, I still see the same behavior. And if you look through the commit history there don't appear to be any code changes that would fix this. This issue should be reopened.

SamStudio8 commented 8 years ago

I believe I'm seeing this error, also.

martinjvickers commented 8 years ago

I suspect that the author isn't using github for development, as the sourceforge site for RAPSearch2;

http://rapsearch2.sourceforge.net/

includes an update in April 2015 the apparently fixes this issue;

2.23 is available now (Apr 6, 2015) Fix a bug that may overwrite results while query files are huge. Fix a bug that may crash the program while aligning long sequences. Fix a bug that may give negative E-value while using '-s F' option.

SamStudio8 commented 8 years ago

Thanks for the heads up, I'll try this now.

On Tue, Jul 5, 2016 at 10:41 AM, Martin Vickers notifications@github.com wrote:

I suspect that the author isn't using github for development, as the sourceforge site for RAPSearch2;

http://rapsearch2.sourceforge.net/

includes an update in April the apparently fixes this issue;

2.23 is available now (Apr 6, 2015) Fix a bug that may overwrite results while query files are huge. Fix a bug that may crash the program while aligning long sequences. Fix a bug that may give negative E-value while using '-s F' option.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/14#issuecomment-230433318, or mute the thread https://github.com/notifications/unsubscribe/ABDfOl1CZtEpR4TNsI2um7SvvUlVC2mLks5qSibXgaJpZM4Dxf4m .