I think the information in the junc-bed file can be better utilized by minimap2 in dealing with cases that deviate from the default settings. Two such cases:
When there are non-consensus splice junctions in the junc-bed file, minimap2 should be able to use those instead of introducing small indels to generate the alignment with consensus splice sites.
When there is an intron that is >200kb (the default max for intron length) in the junc-bed file, minimap2 should use that information to generate an alignment with a large intron.
A couple of specific examples to demonstrate this:
The splice junctions file file and the query fasta file are attached.
Chromosome sequence can be downloaded from NCBI FTP path as shown below:
The query gnl|SRA|SRR1803611.121425.1 is expected to align to the subject with non-consensus splice sites. These are in the splice_junctions.bed file. However, minimap2 aligns this query with consensus splice sites by introducing a 3 nt deletion.
The query gnl|SRA|SRR1803617.262344.1 is expected to align to the subject with an intron >200kb which, again, is in the splice_junctions.bed file. However, minimap2 aligns this query with a 570nt unaligned tail.
I think the information in the junc-bed file can be better utilized by minimap2 in dealing with cases that deviate from the default settings. Two such cases:
A couple of specific examples to demonstrate this: The splice junctions file file and the query fasta file are attached.
Chromosome sequence can be downloaded from NCBI FTP path as shown below:
minimap2 was executed as follows:
The query
gnl|SRA|SRR1803611.121425.1
is expected to align to the subject with non-consensus splice sites. These are in thesplice_junctions.bed
file. However, minimap2 aligns this query with consensus splice sites by introducing a 3 nt deletion. The querygnl|SRA|SRR1803617.262344.1
is expected to align to the subject with an intron >200kb which, again, is in thesplice_junctions.bed
file. However, minimap2 aligns this query with a 570nt unaligned tail.