zhaoyanswill / RAPSearch2

Reduced Alphabet based Protein similarity Search
40 stars 15 forks source link

Get 'Killed' output when using prerapsearch on large files #38

Closed jorvis closed 6 years ago

jorvis commented 6 years ago

I was able to format smaller FASTA files for searching using prerapsearch (v 2.24) on my system (Ubuntu 17.04), but when I try to do the latest Uniref100 it fails at various times. Some attempts have run as long as 6 hours, generating a 204GB index before failing. Retries have it fail within just a few minutes. I have not run out of disk space and it happens whether I use the -s option or not.

The input FASTA file in this case is 56GB. I get very little output to work with.

$ ./RAPSearch2.24_64bits/bin/prerapsearch -d uniref100.20171025.fasta -n uniref100.20171025.rapsearch2.db -s 10
now building hash file
Killed
zhaoyanswill commented 6 years ago

Hi,

Could you please try larger s parameter? Thanks!

On Sun, Nov 12, 2017 at 11:29 PM, Joshua Orvis notifications@github.com wrote:

I was able to format smaller FASTA files for searching using prerapsearch on my system, but when I try to do the latest Uniref100 it fails at various times. Some attempts have run as long as 6 hours, generating a 204GB index before failing. Retries have it fail within just a few minutes. I have not run out of disk space and it happens whether I use the -s option or not.

The input FASTA file in this case is 56GB. I get very little output to work with.

$ ./RAPSearch2.24_64bits/bin/prerapsearch -d uniref100.20171025.fasta -n uniref100.20171025.rapsearch2.db -s 10 now building hash file Killed

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/38, or mute the thread https://github.com/notifications/unsubscribe-auth/AETkp4cwhLTja4TI55ZihFFRdc9FYJv_ks5s18WdgaJpZM4QbOwB .

-- Sincerely, Y. Z.

jorvis commented 6 years ago

I tried up to -s 40 with the same result. What's a practical upper limit?

On Nov 15, 2017 8:07 AM, "zhaoyanswill" notifications@github.com wrote:

Hi,

Could you please try larger s parameter? Thanks!

On Sun, Nov 12, 2017 at 11:29 PM, Joshua Orvis notifications@github.com wrote:

I was able to format smaller FASTA files for searching using prerapsearch on my system, but when I try to do the latest Uniref100 it fails at various times. Some attempts have run as long as 6 hours, generating a 204GB index before failing. Retries have it fail within just a few minutes. I have not run out of disk space and it happens whether I use the -s option or not.

The input FASTA file in this case is 56GB. I get very little output to work with.

$ ./RAPSearch2.24_64bits/bin/prerapsearch -d uniref100.20171025.fasta -n uniref100.20171025.rapsearch2.db -s 10 now building hash file Killed

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/38, or mute the thread https://github.com/notifications/unsubscribe-auth/ AETkp4cwhLTja4TI55ZihFFRdc9FYJv_ks5s18WdgaJpZM4QbOwB .

-- Sincerely, Y. Z.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/38#issuecomment-344602049, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUMk_9xI1j8QFvNXTYih_XzfoTE2ukkks5s2u_qgaJpZM4QbOwB .

zhaoyanswill commented 6 years ago

Could you please try an even larger number, for example, 200, and then see if it's killed? Thanks!

On Wed, Nov 15, 2017 at 9:08 AM, Joshua Orvis notifications@github.com wrote:

I tried up to -s 40 with the same result. What's a practical upper limit?

On Nov 15, 2017 8:07 AM, "zhaoyanswill" notifications@github.com wrote:

Hi,

Could you please try larger s parameter? Thanks!

On Sun, Nov 12, 2017 at 11:29 PM, Joshua Orvis <notifications@github.com

wrote:

I was able to format smaller FASTA files for searching using prerapsearch on my system, but when I try to do the latest Uniref100 it fails at various times. Some attempts have run as long as 6 hours, generating a 204GB index before failing. Retries have it fail within just a few minutes. I have not run out of disk space and it happens whether I use the -s option or not.

The input FASTA file in this case is 56GB. I get very little output to work with.

$ ./RAPSearch2.24_64bits/bin/prerapsearch -d uniref100.20171025.fasta -n uniref100.20171025.rapsearch2.db -s 10 now building hash file Killed

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/38, or mute the thread https://github.com/notifications/unsubscribe-auth/ AETkp4cwhLTja4TI55ZihFFRdc9FYJv_ks5s18WdgaJpZM4QbOwB .

-- Sincerely, Y. Z.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/38#issuecomment- 344602049, or mute the thread https://github.com/notifications/unsubscribe- auth/AAUMk_9xI1j8QFvNXTYih_XzfoTE2ukkks5s2u_qgaJpZM4QbOwB .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/38#issuecomment-344602642, or mute the thread https://github.com/notifications/unsubscribe-auth/AETkp85h3BafumbeTUG08gCxd8CRCqeTks5s2vBmgaJpZM4QbOwB .

-- Sincerely, Y. Z.

jorvis commented 6 years ago

It worked with "-s 100" (wanted to try stepping up.) What are the drawbacks of this?

zhaoyanswill commented 6 years ago

No drawback if each piece of data ( 204GB/100 in your case) is not very small. '-s' parameter denotes how many pieces are data is divided to fit in the memory of the machine.

On Wed, Nov 15, 2017 at 2:45 PM, Joshua Orvis notifications@github.com wrote:

It worked with "-s 100" (wanted to try stepping up.) What are the drawbacks of this?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/38#issuecomment-344706769, or mute the thread https://github.com/notifications/unsubscribe-auth/AETkp8ExPsP1OYIMKpFizl0Vu3udWTehks5s2z9tgaJpZM4QbOwB .

-- Sincerely, Y. Z.

jorvis commented 6 years ago

Thanks for the info. Is that how much memory is needed at index time or search time? I want to keep the search memory low still, so should I only build on a machine similar to my target execution machines with regard to RAM?

zhaoyanswill commented 6 years ago

Yes, you are right. Thanks!

On Wed, Nov 15, 2017 at 3:44 PM, Joshua Orvis notifications@github.com wrote:

Thanks for the info. Is that how much memory is needed at index time or search time? I want to keep the search memory low still, so should I only build on a machine similar to my target execution machines with regard to RAM?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/38#issuecomment-344722502, or mute the thread https://github.com/notifications/unsubscribe-auth/AETkp9VxbuuPZl369-GZbAVeVEJnCmRXks5s200dgaJpZM4QbOwB .

-- Sincerely, Y. Z.