smithlabcode / methpipe

A pipeline for analyzing DNA methylation data from bisulfite sequencing.
http://smithlabresearch.org/methpipe
66 stars 27 forks source link

”bad line in MappedRead file” when using bsrate function #172

Closed xxzzwan closed 3 years ago

xxzzwan commented 3 years ago

Hi, I got an issue with bsrate function saying "bad line in MappedRead file". I had changed my bam file mapped by bismark using to-mr function, and when I used bsrate I got this problem. I wonder what is a good MappedRead file format for bsrate or to-mr. The issue is as follows. bad line in MappedRead file: chr1 10003 10122 FRAG:A00184:657:HKFNHDSXY:4:2360:5873:7686_1:N:0:TCCTGAGC+GCGATCTA:trim_of 20 + CCCTAACCCTNACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAA FFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF I would appreciate it a lot if somebody could give some advice.

guilhermesena1 commented 3 years ago

Hello,

That is indeed a little strange. I copied the line you provided onto an mr file and ran bsrate on the human genome and it ran to completion. Could you confirm (1) the version of methpipe you are using (cloned from repository? Release 4.1.1?) (2) the reference genome you ran bsrate on and (3) if possible, could you also provide the 10 first lines of your mr output (e.g. using head?)

Thank you!

xxzzwan commented 3 years ago

Hello, Thanks a Lot for your kind help! I’ve solved this issue on linux system, with the same command, reference and files. It’s kind of strange hhh. Before it’s on Mac OS Catalina system. Maybe it’s the gcc version that matters. Can I ask you another small question? When using bsrate, is the -c reference C-T converted genome fa file or G-A conversion? Or just the original reference genome?

Best wishes, Xuan

2021年2月26日 上午11:38,Guilherme Sena notifications@github.com 写道:

Hello,

That is indeed a little strange. I copied the line you provided onto an mr file and ran bsrate on the human genome and it ran to completion. Could you confirm (1) the version of methpipe you are using (cloned from repository? Release 4.1.1?) (2) the reference genome you ran bsrate on and (3) if possible, could you also provide the 10 first lines of your mr output (e.g. using head?)

Thank you!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/smithlabcode/methpipe/issues/172#issuecomment-786389048, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR6ADX6YSAYWZODAJM2OTQDTA4JS3ANCNFSM4YHRWRUQ.

guilhermesena1 commented 3 years ago

Hello,

Indeed it might be a compiler-specific issue. We can look into it if we have the output of your g++ --version in the mac you tested. Other users may run into the same issue. We are also working on making the runtime error messages a little more specific so both we and the users have a better idea of the specific formatting error that triggered the failure when one does occur.

For the bsrate question, you do not need to C-T or G-A convert your genome, simply pass the original four-letter reference. Bsrate calculates bisulfite conversion based on Cs in reads that are outside of the CpG context in the genome, so it needs to know where the original Cs are. If a C-T conversion was done before running the program it would probably misleadingly tell you that your conversion was perfect (as it won't find any Cs in the sequence).

Hope that clarifies it, but let us know if you have further questions!

xxzzwan commented 3 years ago

Hello, My g++ version on Mac OS shows like this. g++ --version Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1 Apple clang version 11.0.0 (clang-1100.0.33.17) Target: x86_64-apple-darwin19.6.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin

There’s no problem with configuring and installing methpipe by g++ 4.2.1 hhh.

As for the bsrate, I used lambda dna as spike-in to measure the convertion rate. Should I use lambda dna genome fa file as -c input and compare my target genome with it? But when I do this, it shows “could not find chrom: chr1”. How can I fix it?

Best, Xuan

2021年2月27日 上午12:50,Guilherme Sena notifications@github.com 写道:

Hello,

Indeed it might be a compiler-specific issue. We can look into it if we have the output of your g++ --version in the mac you tested. Other users may run into the same issue. We are also working on making the runtime error messages a little more specific so both we and the users have a better idea of the specific formatting error that triggered the failure when one does occur.

For the bsrate question, you do not need to C-T or G-A convert your genome, simply pass the original four-letter reference. Bsrate calculates bisulfite conversion based on Cs in reads that are outside of the CpG context in the genome, so it needs to know where the original Cs are. If a C-T conversion was done before running the program it would probably misleadingly tell you that your conversion was perfect (as it won't find any Cs in the sequence).

Hope that clarifies it, but let us know if you have further questions!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/smithlabcode/methpipe/issues/172#issuecomment-786763792, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR6ADX7VJPC74A6NB2BYT7TTA7GNLANCNFSM4YHRWRUQ.

guilhermesena1 commented 3 years ago

Hi,

So this error occurs because you are probably giving a FASTA file containing just the spike-in DNA, and the program is reporting that this genome is not the same you mapped your reads to, as it does not contain a chromosome called "chr1".

To use lambda spike-in, you should map your reads to a concatenation of your original reference genome and the spike-in reference. Then, once you convert your SAM file to MR, you can select only the reads that mapped to the spike-in to run bsrate on. For instance, if your MR file is called output.mr, and your spike-in DNA is called "my_spike_in" in your FASTA file (i.e. there's a line that says >my_spike_in), this script should subset only the reads mapped to the spike-in into another MR file:

awk '$1 == my_spike_in' output.mr >output_spikein_only.mr

Then you can compare it to either your spike-in.fa or your reference-genome-with-spike-in.fa using

bsrate -c spike-in.fa -o output.bsrate output_spikein_only.mr

Hope that helps!

xxzzwan commented 3 years ago

Thanks a lot for your kind help! The problem is solved and this issue can be closed. Thank you again!

2021年3月1日 上午10:35,Guilherme Sena notifications@github.com 写道:

Hi,

So this error occurs because you are probably giving a FASTA file containing just the spike-in DNA, and the program is reporting that this genome is not the same you mapped your reads to, as it does not contain chromosome 1.

To use lambda spike-in, you should map your reads to a concatenation of your original reference genome and the spike-in reference. Then, once you convert your SAM file to MR, you can select only the reads that mapped to the spike-in to run bsrate on. For instance, if your MR file is called output.mr, and your spike-in DNA is called "my_spike_in" in your FASTA file, this script should subset only the reads mapped to the spike-in into another MR file:

awk '$1 == my_spike_in' output.mr >output_spikein_only.mr Then you can compare it to either your spike-in.fa or your reference-genome-with-spike-in.fa using

bsrate -c spike-in.fa -o output.bsrate output_spikein_only.mr Hope that helps!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/smithlabcode/methpipe/issues/172#issuecomment-787593997, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR6ADX5KTA6XEN47LRDO4ATTBL4QXANCNFSM4YHRWRUQ.