markrobinsonuzh / DAMEfinder

Finds DAMEs - Differential Allelicly MEthylated regions
MIT License
10 stars 3 forks source link

Cannot read the VCF file #42

Closed hcph closed 1 year ago

hcph commented 1 year ago

When I use this Command: bam_files <- c("7Z-1.test.bam","7Z-2.test.bam") vcf_files <- c("get.vcf","get.vcf") sample_names <- c("7Z-1", "7Z-2") reference_file <- ("/public/home/chaohe/db/CS/CS_genome/CSchr1A.fa") snp.list <- extract_bams(bam_files, vcf_files, sample_names, reference_file, coverage = 2) I got the error: Reading reference file Running sample 7Z-1 Reading VCF file Extracting methylation per SNP Error in character(nchar(a)) : vector size cannot be NA

I do not know what happened, could you please help me? Thanks a lot!

File detail 7Z-1.test.bam : image get.vcf: image CSchr1A.fa: image

sorjuela commented 1 year ago

Hi, looks like your bam files don't have an MD tag. What aligner are you using?

Can you run these steps and let me know what output you get?

#choose the first SNP from your VCF file
loc <- GRanges(seqnames = "chr1A", 
               IRanges(start = 1926, 
                       width = 1))

#select parameters to read in
params <- ScanBamParam(tag = c("MD", "XM", "XR", "XG"), which = loc)

# read in reads overlapping snp
alns <- readGAlignments(bam_files[1], 
                            use.names = TRUE, 
                            param = params)

Please paste the output for alns, just like the first 3 rows is enough.

hcph commented 1 year ago

Hi, looks like your bam files don't have an MD tag. What aligner are you using?

Can you run these steps and let me know what output you get?

#choose the first SNP from your VCF file
loc <- GRanges(seqnames = "chr1A", 
               IRanges(start = 1926, 
                       width = 1))

#select parameters to read in
params <- ScanBamParam(tag = c("MD", "XM", "XR", "XG"), which = loc)

# read in reads overlapping snp
alns <- readGAlignments(bam_files[1], 
                            use.names = TRUE, 
                            param = params)

Please paste the output for alns, just like the first 3 rows is enough.

Thank you for such quickly reply. I use BS-seeker2 to analysis the WGBS datasets, which based on the aligner of Bowtie2. And the output of 'alns' was: image The first 1 rows were : image If I directly paste the result, it was hard to read, so I upload the picture, hope it will not cause you trouble. Thank you again.

sorjuela commented 1 year ago

Thanks, yes so indeed there are no MD tags. It's a bit strange that BS-seeker removes them. Maybe there's an option to keep them?

I've only tested the package with bam files from Bismark So you can try to align with that and try again.

hcph commented 1 year ago

Thanks, yes so indeed there are no MD tags. It's a bit strange that BS-seeker removes them. Maybe there's an option to keep them?

I've only tested the package with bam files from Bismark So you can try to align with that and try again.

So sad to hear that. If I rerun the alignment step, maybe It will take about two months. The genome is wheat (16 Gb), crying..... Anyway, thank you very much!

hcph commented 1 year ago

Thank you very very……much!  I will try.  Best wishes.-------- 原始邮件 --------发件人: Stephany Orjuela @.>日期: 2022年10月17日周一 下午4:49收件人: markrobinsonuzh/DAMEfinder @.>抄送: HC @.>, Author @.>主 题: Re: [markrobinsonuzh/DAMEfinder] Cannot read the VCF file (Issue #42) You can also try running methtuple on your bams, and test the fast-mode (methtuple is not so slow to run). This mode doesn't rely on SNPs so you don't need the MD tag.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>