smarco / gem3-mapper

GEM-Mapper v3
GNU General Public License v3.0
56 stars 17 forks source link

gem-2-wig output not anymore compatible with wigToBigWig #10

Closed splaisan closed 5 years ago

splaisan commented 5 years ago

I used the latest beta from sourceforge (1.423) to produce a mappability track for the latest Drosophila genome build.

My code used to work for other genomes in the past but now fails producing the final BigWig with

error: # converting to BigWIG strange var=val pair line 1 of ./BDGP6.94-mappability/BDGP6.94_150.wig

$ head -6 ./BDGP6.94-mappability/BDGP6.94_150.wig
variableStep    chrom=2L dna    span=262
1       0.0197028
variableStep    chrom=2L dna    span=67
263     0.0241825
variableStep    chrom=2L dna    span=391
330     0.0197028

$ head BDGP6.94.sizes
2L      23513712
2R      25286936
3L      28110227
3R      32079331
4       1348131
X       23542271
Y       3667352
mitochondrion_genome    19517
Unmapped_Scaffold_8     88768
3Cen_mapped_Scaffold_31 87365

what is wrong with this wig that makes wigToBigWig so unhappy?

does this have to do with the chromosome naming of droso? or with the variableStep?

Thanks for any hint.

Stephane

my simplified code

infile=some.fasta
pref=some
idxpref=some_index
thr=8
outdir=.
kmers=150
comp=yes ;# (same with 

# create index
echo "# creating GEM index ... be patient"
gem-indexer -i ${infile} \
    -o ${outdir}/${idxpref} \
    --complement ${comp} \
    -c 'dna' \
    -T ${thr}

# create mappability track
echo "# creating GEM mappability data ... be patient"
gem-mappability -T ${thr} \
    -I ${outdir}/${idxpref}.gem \
    -l ${kmer} \
    -o ${outdir}/${pref}_${kmer}

# convert to wig
echo "# converting to WIG"
gem-2-wig -I ${outdir}/${idxpref}.gem \
    -i ${outdir}/${pref}_${kmer}.mappability \
    -o ${outdir}/${pref}_${kmer}

########## FAILS HERE #############

# convert to BigWig
echo "# converting to BigWIG"
wigToBigWig ${outdir}/${pref}_${kmer}.wig \
    ${outdir}/${pref}.sizes \
    ${outdir}/${pref}_${kmer}.bw
splaisan commented 5 years ago

I found it, my chromosome names (from Ensembl) had a name containing a space and the word dna as created by the UCSC Kent tool faSize. I replaced this step by samtools faidx | cut -f1,2 to produce a valid file.sizes

# clean reference fasta file (names with space or comments)
bioawk -c fastx '{print ">"$name; print $seq}' ${infile} \
    > ${outdir}/${infile}

# create reference size list
echo "# creating reference size list"
samtools faidx ${outdir}/${infile} \
    && cut -f 1,2 ${outdir}/${infile}.fai \
    > ${outdir}/${pref}.sizes

eg '>2L dna:chromosome chromosome:BDGP6:2L:1:23513712:1 REF' => '2L dna' This let to adding a column in the output and shift data around now fixed!

timedreamer commented 5 years ago

Hi, just for those who also had this problem. I encountered the exact same problem. The faSize gave me the correct file. But the wig file has 1 dna as chromosome name. My solution is to delete the dna from wig file (sed "s/ dna//" test.wig > test2.wig). Then the wigToBigWig works fine.