nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
139 stars 8 forks source link

--call-mods not working on m6a #252

Open faulk-lab opened 2 months ago

faulk-lab commented 2 months ago

I'm trying to filter a bam to clamp mod-A to either 0 (canonical) or 100 (modified) so that I can view it in Jbrowse2 showing similar percentages as I am seeing when I summarize the bedmethyl made with modkit pileup.

I ran ~/Desktop/modkit_0.3.2/modkit pileup --ref ref.fasta -t 32 file.bam file.bam.bed which worked fine and estimated a threshold of 0.74.

I summarized the bed with awk '$4=="a" {can+=$13; mod+=$12; oth+=$14; valid+=$10} END{print (can/valid) " A canonical\n" (mod/valid) " 6mA"}' file.bam.bed and see these results: 0.979361 A canonical 0.0206386 6mA

That seems probably right (who is to know really?) So it's telling me 2% of the A's are 6'methylated. When I visualize the bam in Jbrowse2 and color the mods, it's reflected similarly, most of the A's are very faintly highlighted, indicating low probability of modification. However it likes to summarize all of the mod_A calls as mod_A regardless of confidence. So I'd like to filter those out. I tried the following, but it doesn't seem to actually do that. What am I doing wrong?

modkit call-mods file.bam file.074.bam --filter-percentile 0.74 --mod-threshold a:0.74

image
ArtRand commented 2 months ago

Hello @faulk-lab,

I think you want --filter-threshold not --filter-percentile. However, this might not do exactly what you want either. For a sanity check, could you show me what the modkit extract file.074.bam - table looks like for a read in this window (after using --filter-threshold)?

It might be something about how the probabilities are represented in JBrowse. In modkit the probabilities are set to the center of their 1/256 bin, so the minimum modification probability ends up being 0.5 / 256 = .001953125. I don't know if this is what JBrowse does, but it could be why you're still seeing very low probability 6mA calls. What you could try is modkit call-mods file.bam file.074.bam --filter-percentile A:1.0 --mod-threshold a:0.74 this should "fail" all of the canonical calls and if JBrowse is interpreting the tags correctly it should reflect "no information".

faulk-lab commented 2 months ago

Getting closer. Yes I should have said --filter-threshold not --filter-percentile Tried that and it also didn't work. Here's the results you asked for. I ran modkit call-mods file.bam file.074.bam --filter-percentile A:1.0 --mod-threshold a:0.74. Then I ran modkit extract file.074.bam file.074.extract.bam (Incidentally, modkit extract demanded an OUT bam file so I gave it one named 'extract.bam' yet that was somehow a text file). EDIT. Never mind, I see it said "OUT_PATH" not OUT Bam' sorry.

cat file.074.extract.bam

42d8496f-b39b-4daf-89db-6955772e9954    331     612     DQ294633.1      +       +       +       1       4       7474    0.001953125     a       33      .       CAAAT   A       A       false   0
42d8496f-b39b-4daf-89db-6955772e9954    332     613     DQ294633.1      +       +       +       1       4       7474    0.001953125     a       32      .       AAATG   A       A       false   0
42d8496f-b39b-4daf-89db-6955772e9954    345     626     DQ294633.1      +       +       +       1       4       7474    0.001953125     a       30      .       CAAGC   A       A       false   0
42d8496f-b39b-4daf-89db-6955772e9954    350     631     DQ294633.1      +       +       +       1       4       7474    0.001953125     a       33      .       GTATT   A       A       false   0
42d8496f-b39b-4daf-89db-6955772e9954    355     636     DQ294633.1      +       +       +       1       4       7474    0.001953125     a       33      .       CAACA   A       A       false   0

Next I tried modkit call-mods file.bam file.074.bam --filter-percentile A:1.0 --mod-threshold a:0.74 which gave an error error: invalid value 'A:1.0' for '--filter-percentile <FILTER_PERCENTILE>': invalid float literal

I changed the --filter-percentile A:1.0 to just --filter-percentile 1.0 and viewed it in Jbrowse. It's still showing the modifications and they are very slightly lower but definitely not filtered out.

Should I just use IGV?

ArtRand commented 2 months ago

Hello @faulk-lab,

You want a command like this:

modkit call-mods ${bam} - --filter-threshold A:1.0 --mod-threshold a:0.75 2> /dev/null | modkit extract - null --read-calls stdout --no-filtering 2>/dev/null | column -t | less -S

You should this does the following: --filter-threshold A:1.0 sets the canonical A threshold to an unachievable 1.0, so all the canonical calls should fail. --mod-threshold a:0.75 sets the 6mA threshold to 0.75.

Piping through extract with --read-calls gives you the calls at each position.

I think the main confusion is around --filter-percentile vs. --filter-threshold. The former determines which fraction of calls to remove, the latter sets the actual threshold. Sorry for the confusion around this whole thing. As an aside, if you want to keep the 6mA probabilities, you can use

modkit adjust-mods ${bam} ${outbam} --filter-threshold A:1.0 --mod-threshold a:0.75 --filter-probs

This command will also remove the canonical A calls and low confidence 6mA calls, but leave the probabilities of the latter.