markrobinsonuzh / DAMEfinder

Finds DAMEs - Differential Allelicly MEthylated regions
MIT License
10 stars 3 forks source link

I'd like to use bwa-meth instead of bismark BAM files, is it possible? #24

Closed katwre closed 4 years ago

katwre commented 4 years ago

Hi @sorjuela , great package! I was looking for a differential ASM detection tool like yours, however I have BAM files generated by bwa-meth and I do not want to use bismark, is there any possibility to use bwa-meth output as input of your tool? Best, Katarzyna

markrobinsonuzh commented 4 years ago

Dear @katwre,

@sorjuela is away for a couple days, so I'll give a brief answer. I think 'no', because under the hood we use methtuple from @PeteHaitch and on the main page of his repo, it says "Currently, the SAM/BAM file must have been created with Bismark." But, to be honest, I don't know what the differences are in what gets into the BAM; maybe a simple conversion can be done.

Best, Mark

PeteHaitch commented 4 years ago

methtuple parses the XM-tag of Bismark. From memory, bwa-meth doesn't produce this tag, so that's why it's limited to Bismark

katwre commented 4 years ago

@markrobinsonuzh and @PeteHaitch , Thank you very much for your quick reply!

methtuple-based approach looks great, but I think for now I will give a try with SNP-based one.

So then I have a question, if I go for using a bam file + output from BisSNP/biscuit then "MD" and "XG" tags in the alignment could be sufficient? I am asking because it looks like in DAMEfinder code "MD","XM","XR","XG" tags are extracted from a BAM file (e.g. like here https://github.com/markrobinsonuzh/DAMEfinder/blob/87f43acb506131eadfc289b054f3813bc46fc74f/R/split_bams.R#L90), but actually only MD (mismatching positions/bases) and XG (genome conversion state for the alignment, it's CT or GA from what I've just read) are used and in my bwa-meth alignment I can see MD tag and YC tag that is always either GA or CT...

ST-E00351:144:HF3:6:2219:28412:67164      163     chr1    9990    0       8M2D108M        =       9990    118     TATCATTCATAACCCTAACCCTA
ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC    AAAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
JJJJJJJJJJJJAJJJJJFJFJJAJJFJJJJJJJJJJAJJJFJJJAJ<A-<AJF--A-AAF7AFF-AAJA    MC:Z:8M2D108M   YC:Z:GA MD:Z:1G2T0G2^CG108      YD:Z:r  PG:Z:MarkDupl
icates     RG:Z:XXX-RUNID-0144-FLOWCELL-AHF3-LANE-6__val_ NM:i:5  AS:i:108        XS:i:107
sorjuela commented 4 years ago

Hi @katwre, thanks for your interest in the package!

You’re right, for the SNP-mode I actually only use the MD and XG tags. The problem here is that bismark has a specific type of MD tag, which could look like this for example: 2C0C3C7C2C7C1C0C1C1C1C2C2C0C1C2C2C0C6C7C0C0C0C0C0C4C3C1C0C7C5C0C1C5C1C5C1C8, unlike a typical MD tag that looks more like the one in your bwa bam file. Looking at your read makes me think that maybe we could combine the MD with that MC tag, but would have to check what that is next week. My current MD parser only works for the bismark MD, sorry.

Best, Stephany

katwre commented 4 years ago

Hi @sorjuela , Thank you for your response! If you would have any idea how to use MD or combination of MC and MC tags or how actually Bismark creates MD tags that would be very useful!! (e.g. do MD tags have only Gs or Cs in their string? are MD tags calculated from CIGAR strings?) Should I use samtools calmd to recreate MD tags in order to mimic bismark-like MD tags? I tried already few tools for differential allelicly methylated regions calling that use SNP information and unfortunately they don't really work for me, so that's why I would be open to help make your tool work with bwa-meth Cheers, Katarzyna

sorjuela commented 4 years ago

Several things:

  1. Why don't you want to use bismark? I know it requires a lot of memory, but this is really a problem if you have veeery large files.
  2. The MD tag in bismark is like an addition to the CIGAR, because I can use it to do SNP calling, AND to call (allele-specific) methylation. Reason why you see all those Cs, these are mismatches generated by the bisulfite conversion, meaning if it's a mismatch, the original C was unmethylated. As PeteHaitch mentioned above, he uses the XM tag, which is a string with methylation calls (easier and maybe even better than using the MD). I could also use this, but I chose to use the MD because of the above (SNP and meth calling from one place).
  3. The read tags from your output don't seem to contain any methylation information, but it's hard to say, and I can't find any documentation on bwa-meth outputs. Extracting allele-specific methylation from there doesn't seem very straightforward...what tool do you normally use to call methylation?
  4. (side question) What other tools do you know for differential ASM? as far as I know no other tools perform differential ASM. However there are other ASM callers that do not perform differential analysis. Best, S
katwre commented 4 years ago

I have rather big files .fast.gz files (paired-end data, around 45G for each fastq.gz corresponding to a mate), it's cancer patient-derived data and it looks like bwa-meth performs better on this data than bismark

Thank you a lot for explanation of your implementation and having a look at read tags information! Since it doesn't look that there is a straight-forward and fast solution, then probably I will just give up with it for now. I call methylation using methylDacker (https://github.com/dpryan79/MethylDackel) for reads aligned using bwa-meth

I tried biscuit (https://github.com/zwdzwd/biscuit) and https://github.com/astatham/snappymeth

I think this issue can be closed then Thanks, Katarzyna