Open ohdongha opened 2 weeks ago
Thx a bunch for reporting this together with a fix? @kirilenkobm This is likely a bug, as we don't use the lastz substitution matrix in the chaining step. Can you pls incorporate the change and think about how we can simplify the parameter input?
This is likely a bug, as we don't use the lastz substitution matrix in the chaining step.
I think make_lastz_chains
v1.0.0 (MLC v1) does add the BLAST_Q
parameter during the "chain_run" step in doLastzChain.pl
in Line #534. The "fix" I posted above was trying to mimic what doLastzChain.pl
does there.
And when we used the alternative matrix (HoxD55) for a distant species pair, the alignment stat was better when -scoreScheme=HoxD55.q
was added to the axtChain
job scripts during the "chain_run" step (i.e., MLC v1) than when it was not added (i.e., MLC v2, before the "fix").
We have been adding
BLASTZ_Q
parameter when runningmake_lastz_chains
v1.0.0 (MLC v1) to use an alternative scoring matrix (HoxD55 instead of the default HoxD70) for increased sensitivity when aligning more distant species (see issue #12).Especially outside the mammalian or vertebrate clades, people quite often want to align genome pairs as far as human-chicken, human-frog, or human-zebrafish. We cannot say that HoxD55 is optimal for these pairs, but it has been better than HoxD70 and often retrieves substantially more alignment of CDS and synteny. I think the
BLAST_Q
orlastz_q
inmake_lastz_chains
v2.0.8 (MLC v2) that allows an alternative scoring matrix is a needed feature.MLC v2 uses the same
run_lastz.py
script as MLC v1 to set uplastz
job scripts. Hence, we could deliver thelastz_q
command similarly to issue #12. We created a one-line JSON file as follows and added it to the parameter set using--params_from_file ${one_line_JSON}
:With this,
lastz
was run with the HoxD55 matrix as expected. However, the MLC v2 "chain_run" step was not working the same way as MLC v1. This was because thechain_run_step.py
script does not addlastz_q
parameter to theaxtChain
job scripts as-scoreScheme=${matrix}
parameter.We tried to modify the
chain_run_step.py
as below (lines 53, 79, and 80 with "# add lastz_q if it exists" are newly added):This modification will add the matrix parameter ONLY if the
lastz_q
parameter was ever declared. With this modification, MLC v2 alignments for distant pairs with HoxD55 matrix scored as high alignment CDS coverage as those we did with MLC v1. Without it, the alignment CDS coverages were 4~10% lower than what we had with MLC v1 in multiple tests....
This was a bit of a "band-aid" fix, and we had to deliver the
lastz_q
matrix as a JSON parameter file. This seemed to work for thelastz
and "chain_run" steps, printing the job scripts equivalent to those we saw in MLC v1. I am not sure, though, if these two steps are the only place where the matrix parameter needs to be added. I wonder if the "fill_chain" or "clean_chain" step also needs the matrix parameter if we are not using the default. Anyways, delivering the HoxD55 matrix tolastz
and "chain_run" steps recovered the alignment stats (CDS coverage) very close to what we had with MLC v1.You could more formally add the
--lastz_q
parameter tomake_chains.py
. One tricky thing is that, unlike other alphabet parameters (lastz_h
,lastz_y
, etc.),lastz_q
cannot be an empty string if it is declared at all. You may have to include the default HoxD70 matrix as a file and use its location as the default value oflastz_q
.