Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transriptome.
Other
197
stars
30
forks
source link
"megalodon_extras modified_bases create_motif_bed" does not return all of the positions #335
I don't know if this is the intended behavior,
but this is because "megalodon_extras modified_bases create_motif_bed" ignores lowercase sequences in the fasta file.
This can be fixed by modifying
def compile_motif_pat(raw_motif):
ambig_pat_str = "".join(
"[{}]".format(SINGLE_LETTER_CODE[letter]) for letter in raw_motif
)
# add lookahead group to serach for overlapping motif hits
- return re.compile("(?=({}))".format(ambig_pat_str))
+ return re.compile("(?=({}))".format(ambig_pat_str),flags=re.IGNORECASE)
def compile_rev_comp_motif_pat(raw_motif):
ambig_pat_str = "".join(
"[{}]".format("".join(comp(b) for b in SINGLE_LETTER_CODE[letter]))
for letter in raw_motif[::-1]
)
# add lookahead group to serach for overlapping motif hits
- return re.compile("(?=({}))".format(ambig_pat_str))
+ return re.compile("(?=({}))".format(ambig_pat_str),flags=re.IGNORECASE)
Hi,
I used the following command to produce a bed file of the location of CpGs in hg38.
and the motif_sites.bed started from
However, the actual CpGs in hg38 start at
I don't know if this is the intended behavior, but this is because "megalodon_extras modified_bases create_motif_bed" ignores lowercase sequences in the fasta file.
This can be fixed by modifying
in megalodon_helper.py.
Best,
Zicong