oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
177 stars 40 forks source link

Support for non-threaded perl #65

Closed jebrosen closed 1 year ago

jebrosen commented 4 years ago

This is mostly a feature request.

Currently LTR.identifier.pl uses the perl threads mechanism, which is not part of all perl installations. If perl was not compiled with thread support, the log will show a message partway through but it appears to continue as if it simply found no results:


##########################
### LTR_retriever v2.7 ###
##########################

(...)

Parameters: -genome chr4_20mb.fa -inharvest ltrh.log -noanno -threads 8

Wed Mar  4 15:27:29 PST 2020    Dependency checking: All passed!
Wed Mar  4 15:27:48 PST 2020    LTR_retriever is starting from the Init step.
Wed Mar  4 15:27:48 PST 2020    Start to convert inputs...
                                Total candidates: 169
                                Total uniq candidates: 169

Wed Mar  4 15:27:48 PST 2020    Module 1: Start to clean up candidates...
                                Sequences with 10 missing bp or 0.8 missing data rate will be discarded.
                                Sequences containing tandem repeats will be discarded.

Wed Mar  4 15:27:51 PST 2020    169 clean candidates remained

Wed Mar  4 15:27:51 PST 2020    Modules 2-5: Start to analyze the structure of candidates...
                                The terminal motif, TSD, boundary, orientation, age, and superfamily will be identified in this step.

This Perl not built to support threads
Compilation failed in require at ./bin/LTR.identifier.pl line 3.
BEGIN failed--compilation aborted at ./bin/LTR.identifier.pl line 3.
Wed Mar  4 15:27:55 PST 2020    Intact LTR-RT found: 0

cp: cannot stat ‘chr4_20mb.fa.retriever.scn.adj’: No such file or directory
Wed Mar  4 15:27:55 PST 2020    No LTR-RT was found in your data.

Wed Mar  4 15:27:55 PST 2020    All analyses were finished!

I have not read all the details of that part of the code, but it seems like it should be possible (although slightly more complicated) to use fork() instead of thread.

Alternatively, it would be nice if LTR_retriever would die instead of continuing on in this situation so that other pipelines running LTR_retriever can notice something has gone wrong.

oushujun commented 4 years ago

Hi @jebrosen ,

Thank you for your suggestion. I have not tried the fork mechanism but the current threads module is pretty efficient in allocating resources. You are welcome to try on the fork mechanism and PR.

For the partial installation, you can reinstall perl if possible, otherwise you may use the conda quick installation for all dependencies including perl:

conda install -c bioconda ltr_retriever

If the program failed due to the threading issue, it would crash quickly and won't waste too much of the computation time. Users should be able to spot the error messages and seek answers in the issues including this one. You may specify -step Major to pick up the analysis from where it crashed after you solve the threading issue.

Let me know if you have further questions.

Best, Shujun

jebrosen commented 4 years ago

This is not a problem for myself, but it is for end users who don't have control of their execution environment and are stuck with a non-threaded perl. I am assuming they can't use conda either.

I will look into fork when I get a chance.

oushujun commented 4 years ago

Thank you for your thoughtful ideas. As far as I know, conda could be installed by non-root users in their local directory. So if users have no control of the system default perl, they can install conda and perl locally for LTR_retriever.

I can confirm that only the LTR.identifier.pl script is using the threads module. Looking forward to the fork version of this core script.

mdayii commented 3 years ago

does it work if we set perl as LTR conda depository perl during RepeatModeler configure set?

oushujun commented 3 years ago

Not sure I understand what you mean, but you may try so and test with a small input.

Best, Shujun

On Thu, Feb 25, 2021 at 3:07 AM mdayii notifications@github.com wrote:

does it work if we set perl as LTR conda depository perl during RepeatModeler configure set?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/oushujun/LTR_retriever/issues/65#issuecomment-785306074, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NFL5GPMANK7LVW3SLLTAVE7BANCNFSM4LB6GMLQ .

oushujun commented 1 year ago

I investigated the fork() method, and this approach is many times slower than the current threads version due to overheads in creating new program instances for each of the thousands of LTR candidates, which actually becomes a huge burden. I also tried splitting the list of candidates to the number of user-specified CPUs, and creating fork() instances for each list, and somehow it was also very slow. Since installing conda and conda packages does not require root privileges, I will keep the current approach, and the users will need to have a multi-threading Perl, which is also available on conda. I will close this issue for now, please reopen it if you have a better solution.