Closed pre-mRNA closed 2 years ago
Hi,
There are unfortunately no parameters to do this. I can add it to the next version. You can however do it "manually".
prefilter_genomic_reads.py
For example, do the following:(ultra) [kris@rackham3 ~]$ which uLTRA
/proj/snic2020-15-201/anaconda3/envs/ultra/bin/uLTRA
Then I have file prefilter_genomic_reads.py
in
/proj/snic2020-15-201/anaconda3/envs/ultra/lib/python3.8/site-packages/modules/prefilter_genomic_reads.py
prefilter_genomic_reads.py
You can manually edit the prefilter_genomic_reads.py
file on line 32 (the call to minimap2).
For instance, if you want to set -w
to 5 and k
to 13, you can do:
subprocess.check_call([ 'minimap2', '-ax', 'splice', '--eqx' , '-k', '13, '-w', '5', '-t', str(nr_cores), refs_path, read_path], stdout=output_file, stderr=stderr_file)
Thanks for your quick response.
It would be great to be able to control the minimap2 flags in a future release of uLTRA.
I am a big fan of uLTRA by the way. We have done extensive benchmarking for aligning direct RNA reads and uLTRA outperforms minimap2 in all contexts we have tested. Cheers.
Ok, I will do that.
I'm really glad to hear that! Especially since we did not include dRNA sequencing in our evaluations.
Thanks, I had one last question. It says that uLTRA compares mm2 and uLTRA CIGAR strings to decide on the 'best' alignment for a read.
I am wondering what parameters are used in this comparison? Does the algorithm filter for the simplest CIGAR? Or the most aligned bases?
Good question. The computation is currently simple:
Let ultra_matches
and mm2_matches
be all the matching nucleotides in the alignment of uLTRA and minimap2, respectively. Similarly, let ultra_diff
and mm2_diff
be the sum of all substitutions, deletions, and insertions in the alignment of uLTRA and minimap2, respectively.
The, the computation is as follows: S = (ultra_matches - ultra_diff) - (mm2_matches - mm2_diff)
. uLTRA chooses the minimap2 alignment if S
is negative.
I should mention that softclips (at either 5' or 3' ends) are disregarded in the above computations. I observed that minimap2 is very good at softclipping at the right positions and may therefore get a better score according to the computation above, even though they are aligned to identical splice sites (uLTRA extends the alignment out fully in ends). The reason I mention this is that uLTRA prints a table at the end of the alignment stage that states how many alignments were preferred for either aligner. This table is not accurate for low score differences (less than 5-10 score difference) because of the above reason. There is certainly room to explore alternative decision-making on the final alignment.
Interesting, especially the last part about softclipping at read-ends.
When using direct RNA reads, one would expect that there is no need to softclip at the 3' end of the read (except for sometimes a few nt, representing sequencing/basecalling errors) since this represents the bona fide end of the RNA molecule.
Cheers!
Hi,
I understand uLTRA relies on mm2 when aligning to unannotated regions. How can I control minimap2 mapping parameters?
Cheers,