jlab-code / MethylStar

A fast and robust pre-processing pipeline for bulk or single-cell whole-genome bisulfite sequencing (WGBS) data.
GNU General Public License v3.0
30 stars 6 forks source link

BigWig production file issue #3

Closed El-Castor closed 4 years ago

El-Castor commented 4 years ago

Hi @shahryary,

I'am trying to convert bedgraph to bigwig using your pipeline, but I have an error as you can see below and no production of bigwig a the end:

==================================================
Please choose from the menu:

    1. Convert Methimpute output to DMRCaller Format
    2. Convert Methimpute output to Methylkit Format
    3. Convert Methimpute output to bedGraph Format
    4. Convert bedGraph to BigWig Format

B. Back to main Menu

>>  4
converting bedGraph format to Bigwig format ...

Do you want continue to run? [y/n] y
Starting to convert to bigWig format ...
ls: impossible d'accéder à '/NetScratch/cpichot/WGBS_analysis/Zebularine_treatment_out/rdata/*_chr_all.txt': Aucun fichier ou dossier de ce type
Running in single mode. (Parallel is disabled.)
Running for methylome_Mock_FDLM202341331-1a_All-CG ...
bedGraphToBigWig v 4 - Convert a bedGraph file to bigWig format.
usage:
   bedGraphToBigWig in.bedGraph chrom.sizes out.bw
where in.bedGraph is a four column file in the format:
      <chrom> <start> <end> <value>
and chrom.sizes is a two-column file/URL: <chromosome name> <size in bases>
and out.bw is the output indexed big wig file.
If the assembly <db> is hosted by UCSC, chrom.sizes can be a URL like
  http://hgdownload.soe.ucsc.edu/goldenPath/<db>/bigZips/<db>.chrom.sizes
or you may use the script fetchChromSizes to download the chrom.sizes file.
If not hosted by UCSC, a chrom.sizes file can be generated by running
twoBitInfo on the assembly .2bit file.
The input bedGraph file must be sorted, use the unix sort command:
  sort -k1,1 -k2,2n unsorted.bedGraph > sorted.bedGraph
options:
   -blockSize=N - Number of items to bundle in r-tree.  Default 256
   -itemsPerSlot=N - Number of data points bundled at lowest level. Default 1024
   -unc - If set, do not use compression.
Running for methylome_Mock_FDLM202341331-1a_All-CHG ...
bedGraphToBigWig v 4 - Convert a bedGraph file to bigWig format.
usage:
   bedGraphToBigWig in.bedGraph chrom.sizes out.bw
where in.bedGraph is a four column file in the format:
      <chrom> <start> <end> <value>
and chrom.sizes is a two-column file/URL: <chromosome name> <size in bases>
and out.bw is the output indexed big wig file.
If the assembly <db> is hosted by UCSC, chrom.sizes can be a URL like
  http://hgdownload.soe.ucsc.edu/goldenPath/<db>/bigZips/<db>.chrom.sizes
or you may use the script fetchChromSizes to download the chrom.sizes file.
If not hosted by UCSC, a chrom.sizes file can be generated by running
twoBitInfo on the assembly .2bit file.
The input bedGraph file must be sorted, use the unix sort command:
  sort -k1,1 -k2,2n unsorted.bedGraph > sorted.bedGraph
options:
   -blockSize=N - Number of items to bundle in r-tree.  Default 256
   -itemsPerSlot=N - Number of data points bundled at lowest level. Default 1024
   -unc - If set, do not use compression.
Running for methylome_Mock_FDLM202341331-1a_All-CHH ...
bedGraphToBigWig v 4 - Convert a bedGraph file to bigWig format.
usage:
   bedGraphToBigWig in.bedGraph chrom.sizes out.bw
where in.bedGraph is a four column file in the format:
      <chrom> <start> <end> <value>
and chrom.sizes is a two-column file/URL: <chromosome name> <size in bases>
and out.bw is the output indexed big wig file.
If the assembly <db> is hosted by UCSC, chrom.sizes can be a URL like
  http://hgdownload.soe.ucsc.edu/goldenPath/<db>/bigZips/<db>.chrom.sizes
or you may use the script fetchChromSizes to download the chrom.sizes file.
If not hosted by UCSC, a chrom.sizes file can be generated by running
twoBitInfo on the assembly .2bit file.
The input bedGraph file must be sorted, use the unix sort command:
  sort -k1,1 -k2,2n unsorted.bedGraph > sorted.bedGraph
options:
   -blockSize=N - Number of items to bundle in r-tree.  Default 256
   -itemsPerSlot=N - Number of data points bundled at lowest level. Default 1024
   -unc - If set, do not use compression.
Running for methylome_R150mM_FDLM202341332-1a_All-CG ...
bedGraphToBigWig v 4 - Convert a bedGraph file to bigWig format.
usage:
   bedGraphToBigWig in.bedGraph chrom.sizes out.bw
where in.bedGraph is a four column file in the format:
      <chrom> <start> <end> <value>
and chrom.sizes is a two-column file/URL: <chromosome name> <size in bases>
and out.bw is the output indexed big wig file.
If the assembly <db> is hosted by UCSC, chrom.sizes can be a URL like
  http://hgdownload.soe.ucsc.edu/goldenPath/<db>/bigZips/<db>.chrom.sizes
or you may use the script fetchChromSizes to download the chrom.sizes file.
If not hosted by UCSC, a chrom.sizes file can be generated by running
twoBitInfo on the assembly .2bit file.
The input bedGraph file must be sorted, use the unix sort command:
  sort -k1,1 -k2,2n unsorted.bedGraph > sorted.bedGraph
options:
   -blockSize=N - Number of items to bundle in r-tree.  Default 256
   -itemsPerSlot=N - Number of data points bundled at lowest level. Default 1024
   -unc - If set, do not use compression.
Running for methylome_R150mM_FDLM202341332-1a_All-CHG ...
bedGraphToBigWig v 4 - Convert a bedGraph file to bigWig format.
usage:
   bedGraphToBigWig in.bedGraph chrom.sizes out.bw
where in.bedGraph is a four column file in the format:
      <chrom> <start> <end> <value>
and chrom.sizes is a two-column file/URL: <chromosome name> <size in bases>
and out.bw is the output indexed big wig file.
If the assembly <db> is hosted by UCSC, chrom.sizes can be a URL like
  http://hgdownload.soe.ucsc.edu/goldenPath/<db>/bigZips/<db>.chrom.sizes
or you may use the script fetchChromSizes to download the chrom.sizes file.
If not hosted by UCSC, a chrom.sizes file can be generated by running
twoBitInfo on the assembly .2bit file.
The input bedGraph file must be sorted, use the unix sort command:
  sort -k1,1 -k2,2n unsorted.bedGraph > sorted.bedGraph
options:
   -blockSize=N - Number of items to bundle in r-tree.  Default 256
   -itemsPerSlot=N - Number of data points bundled at lowest level. Default 1024
   -unc - If set, do not use compression.
Running for methylome_R150mM_FDLM202341332-1a_All-CHH ...
bedGraphToBigWig v 4 - Convert a bedGraph file to bigWig format.
usage:
   bedGraphToBigWig in.bedGraph chrom.sizes out.bw
where in.bedGraph is a four column file in the format:
      <chrom> <start> <end> <value>
and chrom.sizes is a two-column file/URL: <chromosome name> <size in bases>
and out.bw is the output indexed big wig file.
If the assembly <db> is hosted by UCSC, chrom.sizes can be a URL like
  http://hgdownload.soe.ucsc.edu/goldenPath/<db>/bigZips/<db>.chrom.sizes
or you may use the script fetchChromSizes to download the chrom.sizes file.
If not hosted by UCSC, a chrom.sizes file can be generated by running
twoBitInfo on the assembly .2bit file.
The input bedGraph file must be sorted, use the unix sort command:
  sort -k1,1 -k2,2n unsorted.bedGraph > sorted.bedGraph
options:
   -blockSize=N - Number of items to bundle in r-tree.  Default 256
   -itemsPerSlot=N - Number of data points bundled at lowest level. Default 1024
   -unc - If set, do not use compression.
Converted files. finished in 0 minutes.
You can find the results in /NetScratch/cpichot/WGBS_analysis/Zebularine_treatment_out/bigwig-format folder.

Processing files is finished.

Please, press ENTER to continue ...

Do you have any suggestion please ?

Thanks

shahryary commented 4 years ago

Hi @El-Castor , first you have to convert methimpute output into bedGraph then bedGraph into bigWig, in other words, from menu choice "Convert Methimpute output to bedGraph Format" - item 3 and after that select "Convert bedGraph to BigWig Format" - item 4. Please let me know if you have any questions.

El-Castor commented 4 years ago

Hi @shahryary ,

Of course I have produce the bedgraph before to produce the bam... but it still some error. My bedgraph is on the good folder and not empty. I have produce big wig from it with my own command, see bellow :

set -e
export PATH="/opt/share/FLOCAD/userspace/cpichot/miniconda3/bin/:$PATH"

# TODO : do optget arg set up for automatisation in Snakemake

### Env loading ###

source activate bedtools

### INPUT ###

bedGraphDir="/NetScratch/cpichot/WGBS_analysis/A2_dominant_andromonoecy_out/bedgraph-format"
chromSizePath="/K/FLOCAD/DATA/OMICS/Melon/DNAseq/AdnaneB/Genome_PacBio/toulouse_assemblage/CMiso1.1/20180606/CMiso1.1_genome.fa.chrom.sizes"

### OUTPUT ###
outDir="/NetScratch/cpichot/WGBS_analysis/A2_dominant_andromonoecy_out/bigwig-format"
mkdir -p $outDir

### OPTION ###

### MAIN ###

for bed in ${bedGraphDir}/*.bedGraph;do
    echo "converting bedGraph to bigWig ..."
    bedName=$(basename ${bed%.bedGraph})
    echo "processing : ${bedName}"
    bedGraphToBigWig $bed $chromSizePath ${outDir}/${bedName}.bw
    echo "${bedName} convert to bigWig ! "
done
echo "All bedGraph are converted"

This works well, so I think you have some issue on the bedgraph script.

More over I see that the methylkit format file that you produce it contain all the different methylation context but Methylkit is not able to separate the differents context from this file, do do you have any suggestion?

Thanks in advance.

thanks.

shahryary commented 4 years ago

Hi @El-Castor

Thank you for the feedback. You right, in Methylkit we are separating based on three contexts (CHH,CHG,CG), and after that running the script to convert into bigwig format, the only thing is coming to my mind after checking the scripts, seems the problem is in the formatting of the file. The point is if you have methimpute output file, you already have context too so it's not going problem during converting BUT missing some column(s) while reading the file( methimpute file) will arise the problem. I suggest, please go through my script to see which columns that I'm importing for converting the file. The script file called "src/bash/meth-bedgraph.R".

Please let me know if you have any questions.