lidaof / methylQA

methylation sequence data quality assessment tool
6 stars 1 forks source link

CpG file for bismark mode #1

Open avilella opened 8 years ago

avilella commented 8 years ago

I am trying to use methylQA bismark mode. I am mostly interested in the .report file that comes out of it. I wonder if you could guide me in how to produce the input files needed for it:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e "select chrom, size from hg38.chromInfo" | head -n 24 > hg38.genome

Can you tell me how I can manage to produce this file for human hg38? Am I right in thinking that this file with have approximately 28,000,000 entries?

Thanks,

lidaof commented 8 years ago

Hi @avilella, thanks for your interest in methylQA. I wrote methylQA based on bismark 0.9, if the output of of 0.14 have no big difference with 0.9, methylQA should work for you :) You are right about the size file. CpG.bed file is a bed file with each line contains coordinates of a CpG site. Unfortunately I don't have a URL for you to download right now, I can prepare one for you in a short while. The way we generate the CpG.bed file is using the oligoMatch program from UCSC kent source. And you are right, it's approximately 28M entries, at least for hg19 :)

avilella commented 8 years ago

If you could upload one for hg38 somewhere available, that would be great. A sorted bed.gz file would be of manageable size :-)

On Fri, Sep 18, 2015 at 2:19 PM, Daofeng Li notifications@github.com wrote:

Hi @avilella https://github.com/avilella, thanks for your interest in methylQA. I wrote methylQA based on bismark 0.9, if the output of of 0.14 have no big difference with 0.9, methylQA should work for you :) You are right about the size file. CpG.bed file is a bed file with each line contains coordinates of a CpG site. Unfortunately I don't have a URL for you to download right now, I can prepare one for you in a short while. The way we generate the CpG.bed file is using the oligoMatch program from UCSC kent source. And you are right, it's approximately 28M entries, at least for hg19 :)

— Reply to this email directly or view it on GitHub https://github.com/lidaof/methylQA/issues/1#issuecomment-141449796.

lidaof commented 8 years ago

I am generating the file at our server now, will upload soon after it's done.

lidaof commented 8 years ago

Hi @avilella, you could download from https://cgs.wustl.edu/~dli/methylQA_Download/hg38/. Let me know if you have any problem. Thanks.

avilella commented 8 years ago

Brilliant, I downloaded the file and will try it now...

On Fri, Sep 18, 2015 at 4:08 PM, Daofeng Li notifications@github.com wrote:

Hi @avilella https://github.com/avilella, you could download from https://cgs.wustl.edu/~dli/methylQA_Download/hg38/. Let me know if you have any problem. Thanks.

— Reply to this email directly or view it on GitHub https://github.com/lidaof/methylQA/issues/1#issuecomment-141478411.

avilella commented 8 years ago

I ran it on all my files, but it always crashes at the end of the execution (reporting?):

* CpG file provided: /home/user/hg38cpg/CpG.bed
* Insert size cutoff: 500
* Read coverage threshold: 5
* Provided 1 BAM/SAM file(s)
* Warning: will output stats over each CpG

* Processing /home/user/file.n.bam
* Processed lines: 392816538
* Oversized alignments: 0
* Quality Failed alignments: 0
* Sorting output density
* Sorting output CpG methylation call
* Preparing report file

This is a copy+paste from gdb methylQA core.NNNN:

(gdb) backtrace
#0  0x0000000000403d06 in print_bar (x=-2147483648) at generic.c:2232
#1  0x00000000004041e6 in writeReportBismark (outfile=<value optimized out>, cnt=<value optimized out>, cnt2=0xd38bb0, numFields=<value optimized out>, row=<value optimized out>, bisMode=0, genomeBase=3088269832)
    at generic.c:182
#2  0x000000000040d594 in main_bismark (argc=<value optimized out>, argv=<value optimized out>) at bismark.c:176
#3  0x000000000040f2e2 in main (argc=<value optimized out>, argv=<value optimized out>) at methylQA.c:23
lidaof commented 8 years ago

Thanks for report. Is it possible you could send me a small part of your file to test on my side?

avilella commented 8 years ago

Here is a small bam file (renamed png so that it would be accepted by github). methylQA segfaults with it, the hg38.genome.txt file attached and the cpg file for hg38 on your website.

ntc1_s11_l001_r1_001 cutb sorted bam bai ntc1_s11_l001_r1_001 cutb sorted bam

avilella commented 8 years ago

hg38 genome txt

lidaof commented 8 years ago

Looks the files have been changed some ways...and samtools couldn't open it.

[E::hts_hopen] fail to open file '1.bam' [E::hts_open] fail to open file '1.bam' samtools: failed to open "1.bam" for reading: Success

By the way, do you add -C option when you run? If not, could you add it and re-run?

Thanks.

avilella commented 8 years ago

That did the trick: add chr to the hg38.genome file, have chr in the CpG bed file, then use -C when running on a bam file with no chr.

So, summing up, chr everywhere except the bam, and -C flag when running methylqa bismark

Thx

On Sat, Sep 26, 2015 at 5:26 AM, Daofeng Li notifications@github.com wrote:

Looks the files have been changed some ways...and samtools couldn't open it.

[E::hts_hopen] fail to open file '1.bam' [E::hts_open] fail to open file '1.bam' samtools: failed to open "1.bam" for reading: Success

By the way, do you add -C option when you run? If not, could you add it and re-run?

Thanks.

— Reply to this email directly or view it on GitHub https://github.com/lidaof/methylQA/issues/1#issuecomment-143398313.

ghost commented 7 years ago

Hi, would it be possible to get the hg38-CpG.bed file? The link in the comment is old and doesn't open.

avilella commented 7 years ago

I added a script for that in my fork, may be useful to others:

https://github.com/avilella/methylQA/tree/master/scripts

On Fri, Dec 16, 2016 at 10:50 AM, swernig notifications@github.com wrote:

Hi, would it be possible to get the hg38-CpG.bed file? The link in the comment is old and doesn't open.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lidaof/methylQA/issues/1#issuecomment-267568390, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJpN6XaIr8ieLaU9OlVfCwhX524EmVCks5rIm0RgaJpZM4F_24H .

lidaof commented 7 years ago

@swernig sorry...the link on the website was updated but not in this thread, sorry for that :) you can get it from http://wangftp.wustl.edu/~dli/methylQA_Download/hg38/ or use @avilella 's script :) Thanks again.

ghost commented 7 years ago

Thank you :+1: :)