Open SWittouck opened 4 years ago
Due to the problem of multi-threading in Python, part of the parallel calculation is handled by multi-processes, and all data in the memory will be replicated in each process. Please try to run PEPPA with fewer processes (i.e., 4). I will close this issue for now but please re-open it if you still get an out-of-memory problem.
Dear Zhemin,
Thank you for your suggestion, I will try this.
Best wishes, Stijn
Dear Zhemin,
I tried to run with fewer threads, as you suggested, even down to a single thread. Unfortunately, the issue remained. In annex the log file with the error - it seems to occur in the BLASTn step.
Best wishes, Stijn peppa.log
I have pushed PEPPA in pypi with a formal version number 1.0 The codes in this version have been re-visited to optimize the memory performance. You can install it in python3 >=3.5 via pip install bio-peppa And the executable is 'PEPPA' by default. Hope this can solve the memory leaking problem.
Hi Zhemin,
I installed PEPPA version 1.0 using pip, as you suggested. It didn't fix the problem: I still got out-of-memory errors, no matter the number of threads I used. However, I took a closer look at how PEPPA works, and it seems to me that it is not suited for datasets above the genus level? While I have a genome dataset on the order level; I think the blastn searches are not sensitive enough for those. When I set --clust_identity
to 0.5, --clust_match_prop
to 0.6 and --match_identity
to 0.5, there was no error anymore! So I'm still not sure what caused the error, and I think my dataset is anyway outside of the scope of PEPPA, but at least the error got solved. Thank you for your help!
I have one additional remark: I found a bug in PEPPA_parser.py. In line 64, there is a ]
too many.
Best regards, Stijn
Thank you for the bug report (again) and the solution you found. PEPPA allows a lower limit of "--match_identity" down to 0.4, so your value of 0.5 is fine. However, the "clust_identity" and "clust_match_prop" values are certainly out of my testing scope. I think the phylogeny based paralog splitting will still be able to handle this but am not for sure.
Will push up the fixation for the bug in PEPPA_parser.py later this week.
Dear Zhemin,
Thank you for making PEPPA publicly available and for putting the publication on bioRxiv, it's a very nice read!
I managed to install PEPPA successfully and tried to do a test run on 73 genomes of the order Lactobacillales. After a few minutes I got an out of memory error (memory was indeed full) and the job aborted. Is there anything I can do to solve this? I have 16GB of memory and was using all 16 threads I have available.
Best wishes, Stijn