Closed jebrosen closed 1 year ago
Hi @jebrosen ,
Thank you for your suggestion. I have not tried the fork
mechanism but the current threads
module is pretty efficient in allocating resources. You are welcome to try on the fork
mechanism and PR.
For the partial installation, you can reinstall perl
if possible, otherwise you may use the conda
quick installation for all dependencies including perl
:
conda install -c bioconda ltr_retriever
If the program failed due to the threading issue, it would crash quickly and won't waste too much of the computation time. Users should be able to spot the error messages and seek answers in the issues including this one. You may specify -step Major
to pick up the analysis from where it crashed after you solve the threading issue.
Let me know if you have further questions.
Best, Shujun
This is not a problem for myself, but it is for end users who don't have control of their execution environment and are stuck with a non-threaded perl. I am assuming they can't use conda either.
I will look into fork
when I get a chance.
Thank you for your thoughtful ideas. As far as I know, conda
could be installed by non-root users in their local directory. So if users have no control of the system default perl
, they can install conda
and perl
locally for LTR_retriever
.
I can confirm that only the LTR.identifier.pl
script is using the threads
module. Looking forward to the fork
version of this core script.
does it work if we set perl as LTR conda depository perl during RepeatModeler configure set?
Not sure I understand what you mean, but you may try so and test with a small input.
Best, Shujun
On Thu, Feb 25, 2021 at 3:07 AM mdayii notifications@github.com wrote:
does it work if we set perl as LTR conda depository perl during RepeatModeler configure set?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/oushujun/LTR_retriever/issues/65#issuecomment-785306074, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NFL5GPMANK7LVW3SLLTAVE7BANCNFSM4LB6GMLQ .
I investigated the fork()
method, and this approach is many times slower than the current threads
version due to overheads in creating new program instances for each of the thousands of LTR candidates, which actually becomes a huge burden. I also tried splitting the list of candidates to the number of user-specified CPUs, and creating fork()
instances for each list, and somehow it was also very slow. Since installing conda and conda packages does not require root privileges, I will keep the current approach, and the users will need to have a multi-threading Perl, which is also available on conda. I will close this issue for now, please reopen it if you have a better solution.
This is mostly a feature request.
Currently
LTR.identifier.pl
uses the perlthreads
mechanism, which is not part of all perl installations. If perl was not compiled with thread support, the log will show a message partway through but it appears to continue as if it simply found no results:I have not read all the details of that part of the code, but it seems like it should be possible (although slightly more complicated) to use
fork()
instead ofthread
.Alternatively, it would be nice if
LTR_retriever
woulddie
instead of continuing on in this situation so that other pipelines runningLTR_retriever
can notice something has gone wrong.