Closed anandksrao closed 3 years ago
Hi Anand,
Here are my answers:
Q1. Do I need to explicitly include the '-try1' flag or as the help menu indicates, is this already default?
Q2. To process the failed parts of my run in salvage mode, what is the syntax I should use?
No, the default is -try1 1
as suggested in the help info, so you don't need to do other things to enter the salvage mode, unless you don't want it (-try1 0
).
Q3. Will the salvage mode take shorter time by recognizing failed parts of the run and attempt to repeat just for those genomic regions? Yes, and again, it's an automatic step.
Q4. Is is theoretically possible for the salvage mode itself to fail? In that case, is the only option to use
'-try0' flag, i.e. discard that entire genomic region, OR are there other workarounds?
The behind logic is pretty simple. If a window takes too long to finish, that means it has a pretty complex (or simple) structure such as tandem repeats. There are two ways to solve this, the first way is providing more -time
so that LTR_FINDER can finish all possible candidates. The second way is to make the window shorter so that the number of candidates is significantly reduced. The purpose of this wrapper is fast execution, so I opt to the second solution, which further chops the original window (5Mb) into much shorter regions (50kb), aka, the salvage mode. The pitfall of splitting sequences is also obvious. If you split too much or make a window too small, LTRs can be split into different windows and lost. In our benchmark, this is not big and sometimes even a gain (see the paper).
Q5. Is one such workaround just increasing -time flag to a much larger values. As you can see, as it is I am using 6000, may be I should simply bump it up to 12K or 24K, or could this create any other problems? No problem as far as I know. For "difficult" windows, you just need to wait longer (i.e., up to 6000s per difficult window).
Q6. Could another workaround be reducing the -size flag to smaller genomic sizes? As it is I am already using 1MB windows, rather than default 5MB windows, but could I reduce it further to 0.5MB window, perhaps? Yes you can. See discussions under Q4.
LTR_FINDER_parallel is pretty quick and requires very tiny memory. You may request fewer CPUs and a longer time to get in a shorter queue.
Let me know if you have more questions.
Shujun
Dear Shujun,
I seek your help with understanding how exactly to use salvage mode of your LTR_FINDER_parallel, and also how to avoid using the salvage mode itself, if possible. Before my questions, some context.
Generic syntax I am using:
$ LTR_FINDER_parallel -seq $genome -threads 10 -harvest_out -size 1000000 -time 6000
Dependency check results:Example snippet of STDOUT indicating LTR_FINDER_parallel works OK for the most part:
But a few of these parallel threads gave timeout messages in the same STDOUT:
So my questions to you about your LTR_FINDER_parallel and it's salvage mode are as follows, please:
Q1. Do I need to explicitly include the '-try1' flag or as the help menu indicates, is this already default?
Q2. To process the failed parts of my run in salvage mode, what is the syntax I should use?
Q3. Will the salvage mode take shorter time by recognizing failed parts of the run and attempt to repeat just for those genomic regions?
Q4. Is is theoretically possible for the salvage mode itself to fail? In that case, is the only option to use '-try0' flag, i.e. discard that entire genomic region, OR are there other workarounds?
Q5. Is one such workaround just increasing -time flag to a much larger values. As you can see, as it is I am using 6000, may be I should simply bump it up to 12K or 24K, or could this create any other problems?
Q6. Could another workaround be reducing the -size flag to smaller genomic sizes? As it is I am already using 1MB windows, rather than default 5MB windows, but could I reduce it further to 0.5MB window, perhaps?
I could try all these ideas but my univ HPCC is super busy these days, and starting a job is a long wait, so there's not much opportunity to try different syntax! - And so I am reaching out to you :) Thank you in advance!
Cheers, Anand
Help menu for the installation on my university HPC cluster: