zhaoyanswill / RAPSearch2

Reduced Alphabet based Protein similarity Search
40 stars 15 forks source link

prerapsearch database building problem #6

Open bigsparty opened 10 years ago

bigsparty commented 10 years ago

Hello Yongan,

I came across a problem when building the database for RAPSearch2. I ran the command ./prerapsearch on the RefSeq.protein database and after ca. 20min I received the following error:

terminate called after throwing an instance of 'boost::archive::archive_exception' what(): output stream error Abort trap

Can you tell me whats wrong here?

I guess the database formatting should not be a problem since it is the official RefSeq database but maybe you could reupload the database source of the example in your readme.txt (link is dead)

Best,

Felix

zhaoyanswill commented 10 years ago

Hi Felix,

Are you out of the disk quota?

Sincerely, Y. Z.

On 07/21/2014 08:32 AM, bigsparty wrote:

Hello Yongan,

I came across a problem when building the database for RAPSearch2. I ran the command ./prerapsearch on the RefSeq.protein database and after ca. 20min I received the following error:

terminate called after throwing an instance of 'boost::archive::archive_exception' what(): output stream error Abort trap

Can you tell me whats wrong here?

Best,

Felix

— Reply to this email directly or view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/6.

bigsparty commented 10 years ago

Hey Yongan,

no that should not be the problem (>30GB available and testing it with a 200MB Archaea only database)

Sincerely, Felix

bigsparty commented 10 years ago

Can you reupload the nogCOG example database file?

bigsparty commented 10 years ago

Ok, I narrowed the problem down a bit. I tried to built a database with a handful of sequences from the actual RefSeq Sequence set. This worked fine for me. However if I use more sequences (>10,000) there appears to be a problem. As I do not run out of quota even with alot more sequences. Therefore, I have the feeling that prerapsearch stumbles over a formatting problem in the sequence file. Has this happened before?

zhaoyanswill commented 10 years ago

I didn't encounter this problem. Could you share it with me so I can try to find the problem?

Sincerely, Y. Z.

On 07/21/2014 12:23 PM, bigsparty wrote:

Ok, I narrowed the problem down a bit. I tried to built a database with a handful of sequences from the actual RefSeq Sequence set. This worked fine for me. However if I use more sequences (>10,000) there appears to be a problem. As I do not run out of quota even with alot more sequences. Therefore, I have the feeling that prerapsearch stumbles over a formatting problem in the sequence file. Has this happened before?

— Reply to this email directly or view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/6#issuecomment-49628636.

zhaoyanswill commented 10 years ago

Working on it!

Sincerely, Y. Z.

On 07/21/2014 11:34 AM, bigsparty wrote:

Can you reupload the nogCOG example database file?

— Reply to this email directly or view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/6#issuecomment-49621632.

zhaoyanswill commented 10 years ago

The nogCOG example database file is able to be downloaded!

Sincerely, Y. Z.

On 07/21/2014 11:34 AM, bigsparty wrote:

Can you reupload the nogCOG example database file?

— Reply to this email directly or view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/6#issuecomment-49621632.

bigsparty commented 10 years ago

Sorry for the late reply but it got late over here. I just downloaded your nogCOG example database file and tried to run prerapsearch. I ended with the same error as before. I am confused and dont know what might be wrong. I found some information on this error on http://www.boost.org/doc/libs/1_55_0/libs/serialization/doc/exceptions.html However, I dont understand this since I have no programming experience.

Sincerely, Felix

zhaoyanswill commented 10 years ago

My guess is that it's possible that g++ version you are using is different with that built your boost library.

Could you try to compile boost library by yourself to any folder and then copy those libraries in Makefile to src folder? Here is the link: http://www.boost.org/doc/libs/1_55_0/more/getting_started/unix-variants.html

Sincerely, Y. Z.

On 07/22/2014 02:55 AM, bigsparty wrote:

Sorry for the late reply but it got late over here. I just downloaded your nogCOG example database file and tried to run prerapsearch. I ended with the same error as before. I am confused and dont know what might be wrong. I found some information on this error on http://www.boost.org/doc/libs/1_55_0/libs/serialization/doc/exceptions.html However, I dont understand this since I have no programming experience.

Sincerely, Felix

Sincerely,

Felix

— Reply to this email directly or view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/6#issuecomment-49704503.

bigsparty commented 10 years ago

Hello Yongan, sorry for bothering you with this but I really would like the program to work. I installed Ubuntu on my Mac hoping that this would make things easier for you to help me. Along with it I installed recent GCC and Boost. Then I compiled RAPSearch2 without a problem. When I tried prerapsearch on the test database nogCOGdomN95.faa I got a new error.

~/RAPSearch2/bin$ ./prerapsearch -d nogCOGdomN95.faa -n nogCOGdomN95 now building hash file hash file saved to file nogCOGdomN95

Main END * Error in `./prerapsearch': free(): invalid pointer: 0x00007fff3cd34e40 * Aborted (core dumped)

What does this mean?

Sincerely, Felix

edit: fyi, i switched to ubuntu as your suggestion of compiling boost library by myself and copying them in Makefile to src folder didnt work out.

zhaoyanswill commented 10 years ago

Hi Felix,

"Main END" means the program already finished indexing the reference files. I think you should be able to use rapsearch.

The error indicates that a pointer is freed twice. It's the first time I encounter this error. I'll look into it. But it wouldn't affect your search using RAPSearch.

Sincerely, Yongan

On 7/22/2014 4:10 PM, bigsparty wrote:

Hello Yongan, sorry for bothering you with this but I really would like the program to work. I installed Ubuntu on my Mac hoping that this would make things easier for you to help me. Along with it I installed recent GCC and Boost. Then I compiled RAPSearch2 without a problem. When I tried prerapsearch on the test database nogCOGdomN95.faa I got a new error.

~/RAPSearch2/bin$ ./prerapsearch -d nogCOGdomN95.faa -n nogCOGdomN95 now building hash file hash file saved to file nogCOGdomN95

        Main END
        *** Error in `./prerapsearch': free(): invalid pointer:
        0x00007fff3cd34e40 ***
        Aborted (core dumped)

What does this mean?

Sincerely, Felix

— Reply to this email directly or view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/6#issuecomment-49793589.

bigsparty commented 10 years ago

Hey Yongan,

everything seems to be working fine now. Although I couldnt find a solution to the original problem (running RAPSearch on Mac OSX) and I have to use Ubuntu for the analysis now, I am happy to be able to use your software now.

Thank you for your help.

Sincerely,

Flix

bigsparty commented 10 years ago

One more thing: How can I run a rapsearch on multiple/split databases? I assume that the size of the databse is limited to my system memory. Unfortunately I have only 8GB available and want to search against the RefSeq database. What do I have to look out for? Can I merge the output if I split the database?

zhaoyanswill commented 10 years ago

You can try "-s" parameter with a large number (how many splits you want) while using prerapsearch. Then the memory usage will be reduced when you use rapsearch. And the results are automatically merged.

Sincerely, Yongan

On 7/23/2014 7:31 AM, bigsparty wrote:

One more thing: How can I run a rapsearch on multiple/split databases? I assume that the size of the databse is limited to my system memory. Unfortunately I have only 8GB available and want to search against the RefSeq database. What do I have to look out for? Can I merge the output if I split the database?

— Reply to this email directly or view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/6#issuecomment-49862153.

bigsparty commented 10 years ago

Unfortunately my Mac freezes when I attempt to use prerapsearch on datasets bigger than approx. 1GB. Databases lower than 1GB take a few minutes to build up. I tried building up a 3.6GB database with -s 2 to -s 20 but after 8h I got error 'boost::archive::archive_exception' Any guesses? Is there a way to split up the database in pieces (without the -s command) and concatenate afterwards?

Sincerely, Felix

zhaoyanswill commented 10 years ago

Please try to check the disk space. -s 10 should work.

You may try "split" command to split the database in pieces you want, prerapsearch and rapsearch each piece, and then merge all results together.

Sincerely, Y. Z.

On 07/24/2014 03:07 AM, bigsparty wrote:

Unfortunately my Mac freezes when I attempt to use prerapsearch on datasets bigger than approx. 1GB. Databases lower than 1GB take a few minutes to build up. I tried building up a 3.6GB database with -s 2 to -s 20 but after 8h I got error 'boost::archive::archive_exception' Any guesses? Is there a way to split up the database in pieces (without the -s command) and concatenate afterwards?

Sincerely, Felix

— Reply to this email directly or view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/6#issuecomment-49974392.

bigsparty commented 10 years ago

Hey Yongan,

just to inform you: building the database finally worked for me. It was a memory issue after all but I could get it running on a server cluster. However, the proposed splitting option, e.g. -s 10, doesnt work. I always end up with one huge database file not several single files.

Sincerely, Felix

zhaoyanswill commented 10 years ago

Hi Felix,

All of split pieces will be stored in one file. So don't worry about it.

Sincerely, Y. Z.

On 07/29/2014 07:19 AM, bigsparty wrote:

Hey Yongan,

just to inform you: building the database finally worked for me. It was a memory issue after all but I could get it running on a server cluster. However, the proposed splitting option, e.g. -s 10, doesnt work. I always end up with one huge database file not several single files.

Sincerely, Felix

— Reply to this email directly or view it on GitHub https://github.com/zhaoyanswill/RAPSearch2/issues/6#issuecomment-50463653.