Open faulk-lab opened 2 months ago
Hello @faulk-lab,
I think you want --filter-threshold
not --filter-percentile
. However, this might not do exactly what you want either. For a sanity check, could you show me what the modkit extract file.074.bam -
table looks like for a read in this window (after using --filter-threshold
)?
It might be something about how the probabilities are represented in JBrowse. In modkit
the probabilities are set to the center of their 1/256 bin, so the minimum modification probability ends up being 0.5 / 256 = .001953125. I don't know if this is what JBrowse does, but it could be why you're still seeing very low probability 6mA calls. What you could try is modkit call-mods file.bam file.074.bam --filter-percentile A:1.0 --mod-threshold a:0.74
this should "fail" all of the canonical calls and if JBrowse is interpreting the tags correctly it should reflect "no information".
Getting closer. Yes I should have said --filter-threshold
not --filter-percentile
Tried that and it also didn't work. Here's the results you asked for. I ran modkit call-mods file.bam file.074.bam --filter-percentile A:1.0 --mod-threshold a:0.74
. Then I ran modkit extract file.074.bam file.074.extract.bam
(Incidentally, modkit extract demanded an OUT bam file so I gave it one named 'extract.bam' yet that was somehow a text file). EDIT. Never mind, I see it said "OUT_PATH" not OUT Bam' sorry.
cat file.074.extract.bam
42d8496f-b39b-4daf-89db-6955772e9954 331 612 DQ294633.1 + + + 1 4 7474 0.001953125 a 33 . CAAAT A A false 0
42d8496f-b39b-4daf-89db-6955772e9954 332 613 DQ294633.1 + + + 1 4 7474 0.001953125 a 32 . AAATG A A false 0
42d8496f-b39b-4daf-89db-6955772e9954 345 626 DQ294633.1 + + + 1 4 7474 0.001953125 a 30 . CAAGC A A false 0
42d8496f-b39b-4daf-89db-6955772e9954 350 631 DQ294633.1 + + + 1 4 7474 0.001953125 a 33 . GTATT A A false 0
42d8496f-b39b-4daf-89db-6955772e9954 355 636 DQ294633.1 + + + 1 4 7474 0.001953125 a 33 . CAACA A A false 0
Next I tried modkit call-mods file.bam file.074.bam --filter-percentile A:1.0 --mod-threshold a:0.74
which gave an error error: invalid value 'A:1.0' for '--filter-percentile <FILTER_PERCENTILE>': invalid float literal
I changed the --filter-percentile A:1.0
to just --filter-percentile 1.0
and viewed it in Jbrowse. It's still showing the modifications and they are very slightly lower but definitely not filtered out.
Should I just use IGV?
Hello @faulk-lab,
You want a command like this:
modkit call-mods ${bam} - --filter-threshold A:1.0 --mod-threshold a:0.75 2> /dev/null | modkit extract - null --read-calls stdout --no-filtering 2>/dev/null | column -t | less -S
You should this does the following:
--filter-threshold A:1.0
sets the canonical A threshold to an unachievable 1.0, so all the canonical calls should fail.
--mod-threshold a:0.75
sets the 6mA threshold to 0.75.
Piping through extract
with --read-calls
gives you the calls at each position.
I think the main confusion is around --filter-percentile
vs. --filter-threshold
. The former determines which fraction of calls to remove, the latter sets the actual threshold. Sorry for the confusion around this whole thing. As an aside, if you want to keep the 6mA probabilities, you can use
modkit adjust-mods ${bam} ${outbam} --filter-threshold A:1.0 --mod-threshold a:0.75 --filter-probs
This command will also remove the canonical A calls and low confidence 6mA calls, but leave the probabilities of the latter.
I'm trying to filter a bam to clamp mod-A to either 0 (canonical) or 100 (modified) so that I can view it in Jbrowse2 showing similar percentages as I am seeing when I summarize the bedmethyl made with modkit pileup.
I ran
~/Desktop/modkit_0.3.2/modkit pileup --ref ref.fasta -t 32 file.bam file.bam.bed
which worked fine and estimated a threshold of 0.74.I summarized the bed with
awk '$4=="a" {can+=$13; mod+=$12; oth+=$14; valid+=$10} END{print (can/valid) " A canonical\n" (mod/valid) " 6mA"}' file.bam.bed
and see these results: 0.979361 A canonical 0.0206386 6mAThat seems probably right (who is to know really?) So it's telling me 2% of the A's are 6'methylated. When I visualize the bam in Jbrowse2 and color the mods, it's reflected similarly, most of the A's are very faintly highlighted, indicating low probability of modification. However it likes to summarize all of the mod_A calls as mod_A regardless of confidence. So I'd like to filter those out. I tried the following, but it doesn't seem to actually do that. What am I doing wrong?
modkit call-mods file.bam file.074.bam --filter-percentile 0.74 --mod-threshold a:0.74