kundajelab / chrombpnet

Bias factorized, base-resolution deep learning models of chromatin accessibility (chromBPNet)
https://github.com/kundajelab/chrombpnet/wiki
MIT License
124 stars 34 forks source link

Input file shifts inconsistent #174

Closed monikaheinzl closed 10 months ago

monikaheinzl commented 10 months ago

Hi,

I have a similar issue as in #153 and #169, while training the bias model for my Drosophila DNase-seq data. I have mapped my raw fastq files with Bowtie and MACS2, and I didn’t do any shifting to the data. There is also no inconsistency between the genomic versions of the BAM and BED files (see a screenshot later). Still, I get the following error:

Traceback (most recent call last):
  File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/CHROMBPNET.py", line 179, in <module>
    main()
  File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/CHROMBPNET.py", line 38, in main
    pipelines.train_bias_pipeline(args)
  File “/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/pipelines.py", line 278, in train_bias_pipeline
    reads_to_bigwig.main(args)
  File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/reads_to_bigwig.py", line 96, in main
    plus_shift, minus_shift = auto_shift_detect.compute_shift(args.input_bam_file,
  File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/auto_shift_detect.py", line 234, in compute_shift
    plus_shift, minus_shift = compute_shift_DNASE(ref_plus_pwms, ref_minus_pwms, plus_pwm, minus_pwm)
  File "/.conda/envs/chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/auto_shift_detect.py", line 211, in compute_shift_DNASE
    raise ValueError("Input file shifts inconsistent. Please post an Issue")
ValueError: Input file shifts inconsistent. Please post an Issue

Here is also my command for training the bias model:

chrombpnet bias pipeline \
        -ibam input.bam \
        -d "DNASE" \
        -g $genome \
        -c $chrom.sizes \
        -p $peak_file \
        -n $negatives_file \
        -fl fold_0.json \
        -b 0.8 \
        -o $outfolder/ \
        -fp bias_model

As suggested in issue #153, I have also generated the PWM for my BAM file: image

Here is also an example of my peak file:

chr2L   5183    6872    peak_1  516 .   5.13902 51.60207    48.89282    662
chr2L   18506   19124   peak_2  147 .   2.66881 14.75620    12.99561    447
chr2L   19642   20053   peak_3  64  .   1.98697 6.48809 5.02063 247
chr2L   21603   21939   peak_4  79  .   2.12766 7.90964 6.38455 191
chr2L   34039   34352   peak_5  123 .   2.59835 12.32056    10.64090    152
chr2L   35402   35815   peak_6  111 .   2.51643 11.10707    9.46780 222
chr2L   41896   42336   peak_7  51  .   1.86441 5.16313 3.75423 229
chr2L   45892   47619   peak_8  218 .   3.82924 21.89166    19.91631    1413
chr2L   52497   54348   peak_9  145 .   3.19574 14.50274    12.75090    543

Many thanks for your help, Monika

panushri25 commented 10 months ago

Hello Monika,

Can you post the commands you used for generating the png?

monikaheinzl commented 10 months ago

I adapted them from issue #153. But here they are:

samtools view -b input.bam chr2L > out.bam

samtools view -b  -F796  -@50 out.bam | bedtools bamtobed -i stdin | awk -v OFS="\t" '{if ($6=="-"){print $1,$2,$3,$4,$5,$6} else if ($6=="+") {print $1,$2,$3,$4,$5,$6}}' | bedtools genomecov -bg -5 -i stdin -g $chrom.sizes | bedtools sort -i stdin > tmp2

bedGraphToBigWig tmp2 $chrom.sizes unstranded.bw

python build_pwm_from_bigwig.py -i unstranded.bw -g $genome -o DHS_no_shift -cr "chr2L" -c $chrom.sizes
panushri25 commented 10 months ago

Hello @monikaheinzl,

DNase I cleavage logo is known to be pretty variable -

image

I was expecting to see something closer to any of these representations. But the PWM your showing is very different. So can you check for the following (1) if there is some problem with the build or preprocessing resulting in the bam? Cross check your individual bams that were merged, do they result in the same PWM? cross check if this is coming from DNase experiment?

If it is none of this let me know - I can suggest an alternate version of the repo that will bypass this error. But be very sure that none of the above is happening.

(Image source: https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-019-1642-2/MediaObjects/13059_2019_1642_MOESM1_ESM.pdf)

Best, Anu

monikaheinzl commented 10 months ago

Hi,

Ok, thanks for your help already! I will follow your suggestions and then come back to you.

Best, Monika

panushri25 commented 10 months ago

Closing this due to inactivity, feel free to open this if you continue to see issues.