Open ohdongha opened 1 month ago
Thx for reporting this. I just checked and we use a minScore of 25000 for all alis that we generate these days. But we have used 5000 before. I can't recall why we changed this and whether it mattered much.
Chains with a score <25000 are typically small and for most well-assembled genomes chains score 1+ million.
If you want to maximize sensitivity, pls set the threshold to 5000 or lower.
Thx for reporting this. I just checked and we use a minScore of 25000 for all alis that we generate these days. But we have used 5000 before.
Thanks for sharing this. For "chain_run" step, we have been using the default minScore
1000 (both MLC v1 and v2 have 1000 as the default) or 5000 when the scoring matrix is HoxD55. If increasing the default to 25000 during "chain_run" step does not lose much of the alignment CDS coverage while removing spurious and small alignments, that may work for our purpose as well. I will give it a try.
The score I mentioned in the post above is the default values for "fill_chain" step. MLC v1 used chain_gap_filler.py --chainMinScore 25000
while MLC v2 had chain_gap_filler.py --chainMinScore 1000
by default, reusing the same parameter used in "chain_run" step. If we are going to use the same value for the "chain_run" and "fill_chain" steps (e.g., 25000 for both steps), then the current v2.0.8 code is correct.
Still, in v2.0.8, make_chains.py
accepts --fill_chain_min_score
when setting up the pipeline, but the value does not seem to be delivered to chain_gap_filler.py
or anywhere else. If you intend to use the same chain_min_score
for both "chain_run" and "fill_chain" steps, it may be better to remove the --fill_chain_min_score
argument from make_chains.py
.
Yes, I understood. Pls keep the minscore of 1000 or 5000 for the chaining step.
For repeatFiller, this would only mean that gaps in the chains below the score threshold would not be filled (but such short chains often have pretty much no substantial gaps)
Yes, I understood. Pls keep the minscore of 1000 or 5000 for the chaining step.
For repeatFiller, this would only mean that gaps in the chains below the score threshold would not be filled (but such short chains often have pretty much no substantial gaps)
Thanks for the clarification. We will then continue to use 1000 and 5000 for the chaining ("chain_run") step, and 25000 for the repeatFiller ("fill_chain") step.
Hi again,
Recently, we have been testing
make_lastz_chains
v2.0.8 (MLC v2) to run on our SGE grid. After adding someclusterOptions
directives to theNextflow
template (execute_joblist.nf
), the MLC v2 pipeline runs well.However, the alignment results have always been slightly different from those with
make_lastz_chains
v1.0.0 (MLC v1), so we compared the temporary job scripts from each step.One difference is that MLC v1 uses
--chainMinScore 25000
when runningchain_gap_filler.py
during the "fill_chain" step, while MLC v2 uses--chainMinScore 1000
or any other value given with--min_chain_score
when setting up the pipeline.MLC v2 accepts
--fill_chain_min_score
separately, with a default value 25000.But in the
fill_chain_step.py
code, it usesparam.chain_min_score
instead ofparam.fill_chain_min_score
(#24) when building job scripts that runchain_gap_fillter.py
:I wonder if this is intended.
Replacing
param.chain_min_score
withparam.fill_chain_min_score
appears to reduce the number of final alignments slightly (after post-processing) without affecting the alignment coverage of CDS....
There are also differences in how the target and query sequences were chunked and how sequences smaller than the chunk size were treated during the
lastz
step, but for this, I think what v2 does makes more sense than v1. :)Another difference is handling the
lastz_q
(orBLASTZ_Q
) parameter during the "chain_run" step. I will write about this in another issue.