xie186 / ViewBS

ViewBS - a powerful toolkit for visualization of high-throughput bisulfite sequencing data
GNU General Public License v3.0
83 stars 27 forks source link

Tabix region lookup requires a gzipped, tabixed bedfile at /usr/local/lib/x86_64-linux-gnu/perl/5.22.1/Bio/DB/HTS/Tabix.pm line 40. #44

Closed jasonsaunderswilliams closed 4 years ago

jasonsaunderswilliams commented 4 years ago

Hi,

I am attempting to perform the MethOveRegion function of ViewBS. I sorted, gzipped and tabixed the reference file I have "hg38promsorted.bed.gz". The index file for this "hg38promsorted.bed.gz.csi" is in the same folder as "hg38promsorted.bed.gz". I also tabixed "3944Swan_CS001_hg38.fastq.gz_bismark_bt2.CpG_report.txt", its index is also in the same folder. I should mention that I'm working inside a docker:

sudo docker run -ti --rm -v /home/ubuntu/work/Jason/:/inputfiles xie186/viewbs ViewBS MethOverRegion --region /inputfiles/hg38promsorted.bed.gz --sample /inputfiles/3944Swan_CS001_hg38.fastq.gz_bismark_bt2.CpG_report.txt,T0 --outdir /inputfiles --context CG

Subcommand: MethOverRegion /inputfiles/3944Swan_CS001_hg38.fastq.gz_bismark_bt2.CpG_report.txt,T0 /inputfiles/hg38promsorted.bed.gz

It seemed that the reigons you provided are Genes. Please make sure of this.

Output directory is: /inputfiles Default output prefix is used: MethOverRegion Start calculate methylation information for target context CG Tabix region lookup requires a gzipped, tabixed bedfile at /usr/local/lib/x86_64-linux-gnu/perl/5.22.1/Bio/DB/HTS/Tabix.pm line 40.

The error suggests that the .csi needs to be in that location? I don't understand enough about how the docker works to know where to put the tabix files or tell ViewBS where to find them Any workaround suggestions for this would be appreciated.

Best wishes,

Jason

xie186 commented 4 years ago

Hi @jasonsaunderswilliams , I could successfully run it in Docker:

docker run -v ${PWD}:/data -w /data cd240cd8fcef ViewBS MethOneRegion MethOverRegion --region TAIR10_GFF3_genes_chr1.bed --sample bis_WT.tab.gz,WT --sample bis_cmt23.tab.gz,cmt23 --prefix bis_gene_chr1_sample --context CG --outdir results_conda/MethOverRegion_Docker

Could you please try the test data on your server to see what will happen? Thanks.

Best, Shaojun

jasonsaunderswilliams commented 4 years ago

Hi Shaojun,

I would like to try the test data, however I can't see the files in your command in the data folder here. Could you direct me to which test data files you are referring to and I will duly run it.

Many thanks,

Jason

xie186 commented 4 years ago

Hi Jason, Please download the test data here: https://github.com/xie186/ViewBS#download-test-data Best, Shaojun

jasonsaunderswilliams commented 4 years ago

Hi Shaojun,

I ran the test data succesfully this morning:


Subcommand: MethOverRegion
/inputfiles/bis_WT.tab.gz,WT /inputfiles/bis_cmt23.tab.gz,cmt23
/inputfiles/TAIR10_GFF3_genes_chr1.bed

It seemed that the reigons you provided are Genes. Please make sure of this.

Output directory is: /inputfiles
Output prefix: bis_gene_chr1_sample
Start calculate methylation information for target context
CG
.........
.........
Meth::OverRegion=HASH(0x2dad290): Usage: R --vanilla --slave --height <fig height> --width <fig width> --adjustXaxis <X> --input <input tab> --xlab <Gene>  --output <output> < OverRegion.R 
/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/biodocker/ViewBS/lib/Meth/OverRegion.R --args --input /inputfiles/bis_gene_chr1_sample_MethOverRegion_CG.txt --xlab Gene --height 10 --width 10 --adjustXaxis 10 --output /inputfiles/bis_gene_chr1_sample_MethOverRegion_CG.pdf 

Running time:206 wallclock secs (197.35 usr  1.31 sys +  1.34 cusr  0.46 csys = 200.46 CPU)

I noticed that the test data indices are .tbi and your sample files are tab.Mine are .txt and .csi respectively.

Best wishes,

Jason

jasonsaunderswilliams commented 4 years ago

Hi again Shaojun,

I attach (google drive) the files I'm working with before processing them. I suspect there is something wrong in the way I'm using tabix and the output files I'm getting. Do you sort files before zipping and then tabixing? Could you possibly send me the command you use to generate the .tbi files in the test data folder?

I can then try and perform these commands on the CpG_report files I get as output from Bismark.

  1. region bed file (human promoters of hg38): https://drive.google.com/open?id=1VBqSmcrEj1FQbYn4Z81LUj1IjXxVQYsz
  2. CpG report sample 1 (Bismark): https://drive.google.com/open?id=15PD4NtmU7DLHvLhgBE0xmcfY-uxDOW6l
  3. CpG report sample 2 (Bismark): https://drive.google.com/open?id=1McEmEggJ7Lk9-dFZwxuoInW1K0WAMjWP

All the best,

Jason

jasonsaunderswilliams commented 4 years ago

Hi Shaojun,

I've gone back to bismark following your instructions to generate the same files as you. I'll follow these and get back to you ;)

Best,

Jason

xie186 commented 4 years ago

Thanks.

jasonsaunderswilliams commented 4 years ago

Hi again Shaojun,

So I followed the ViewBS github manual to process through Bismark. It's a bit confusing because ViewBS asks for the Genome-wide cytosine methylation report. These are generated as : bismark_bt2.CpG_report.txt files...

ubuntu@repro-group:~$ sudo head /home/ubuntu/work/Jason/bismarkgenomes/3944Swan_CS001_hg38_bismark_bt2.CpG_report.txt 
chr19   60119   +   0   0   CG  CGA
chr19   60120   -   0   0   CG  CGG
chr19   60172   +   0   0   CG  CGA
chr19   60173   -   0   0   CG  CGA
chr19   60183   +   0   0   CG  CGG
chr19   60184   -   0   0   CG  CGT
chr19   60340   +   0   0   CG  CGT
chr19   60341   -   0   0   CG  CGT
chr19   60376   +   0   0   CG  CGG
chr19   60377   -   0   0   CG  CGT

These fit the <chromosome> <position> <strand> <count methylated> <count unmethylated> <C-context> <trinucleotide context> format as described by the bismark manual and yourselves.

The .cov.gz filetype generated from the bismark_methylation_extractor step are these:

chr19   60857   60857   80  4   1
chr19   60882   60882   100 5   0
chr19   95567   95567   100 1   0
chr19   103529  103529  93.3333333333333    28  2
chr19   103585  103585  57.5    23  17
chr19   103675  103675  70  21  9
chr19   103691  103691  86.6666666666667    26  4
chr19   103735  103735  73.3333333333333    22  8
chr19   119558  119558  100 1   0
chr19   146618  146618  0   0   1

These do not fit this format, but do have the .cov file extension.

Bismark says that the Genome-wide Cytosine methylation report (.txt file above) will be sorted by chromosome coordinate, but it isn't.

I cannot use tabix on either of these files nor have I been able to sort them.

Could you clear up for me which of the above Bismark outputs I should be using as input and also help me by letting me know how I can sort and tabix them appropriately so I can feed them into ViewBS?

Thanks very much for all your help.

Best wishes,

Jason

jasonsaunderswilliams commented 4 years ago

Hi again Shaojun,

I managed to solve my problems with a little manipulation of the files. I paste my code here for anyone with the same problems. Maybe it would be useful for you to clear up that the actual genome-wide cytosine reports are text files that - in my case - needed to be sorted then tabixed. Also, when downloading bed files for the --region input some manipulation was necessary to match the format of the test data input .bed files. I paste my code here in case it is useful to anyone starting with the unsorted genome-wide cytosine report .txt files:

  1. Bismark methylation extractor
    
    bismark_methylation_extractor --multicore 5 --bedGraph /inputfiles/bismarkgenomes/4046Swan_Donor_hg38_bismark_bt2.bam --cytosine_report -o /inputfiles/bismarkgenomes --genome_folder /inputfiles/bismarkgenomes
2. Sort genome-wide Cytosine methylation report by chromosome coordinate:

sort -k1,1V -k2,2n /inputfiles/bismarkgenomes/4046Swan_Donor_hg38_bismark_bt2.CpG_report.tab -o /inputfiles/bismarkgenomes/ViewBS/4046Swan_Donor_hg38_bismark_bt2CpG_reportsorted.tab

3. bgzip through HTSlib

bgzip /inputfiles/bismarkgenomes/ViewBS/4046Swan_Donor_hg38_bismark_bt2CpG_reportsorted.tab


4. tabix through HTSlib to generate .tbi indices

tabix -p vcf /inputfiles/bismarkgenomes/ViewBS/3944.tab.gz