kundajelab / chrombpnet

Bias factorized, base-resolution deep learning models of chromatin accessibility (chromBPNet)
https://github.com/kundajelab/chrombpnet/wiki
MIT License
124 stars 34 forks source link

Malformed BED entry error during chrombpnet training #211

Closed cahn20 closed 1 month ago

cahn20 commented 1 month ago

Hi,

I'm getting the following error during the chrombpnet training step:

Error: malformed BED entry at line 192517967. Start was greater than end. Exiting.
Traceback (most recent call last):
  File "/users/ahnsf9/conda_env/my_chrombpnet/bin/chrombpnet", line 8, in <module>
    sys.exit(main())
  File "/users/ahnsf9/conda_env/my_chrombpnet/lib/python3.8/site-packages/chrombpnet/CHROMBPNET.py", line 23, in main
    pipelines.chrombpnet_train_pipeline(args)
  File "/users/ahnsf9/conda_env/my_chrombpnet/lib/python3.8/site-packages/chrombpnet/pipelines.py", line 31, in chrombpnet_train_pipeline
    build_pwm_from_bigwig.main(args)
  File "/users/ahnsf9/conda_env/my_chrombpnet/lib/python3.8/site-packages/chrombpnet/helpers/preprocessing/analysis/build_pwm_from_bigwig.py", line 56, in main
    bigwig_vals = np.nan_to_num(bw.values(args.chr,0,chr_size ))
RuntimeError: Invalid interval bounds!

The stdout log looks like the following:

Estimating enzyme shift in input file
Current estimated shift: +0/+0
awk -v OFS="\t" '{if ($6=="+"){print $1,$2+4,$3,$4,$5,$6} else if ($6=="-") {print $1,$2,$3-4,$4,$5,$6}}' | sort -k1,1 | bedtools genomecov -bg -5 -i stdin -g chrom.size.regular | LC_COLLATE="C" sort -k1,1 -k2,2n 
Making BedGraph (Filter chromosomes not in reference fasta)
Making Bigwig

It seems like the error occurs during bigwig file creation and has to do with my input bam file, but I don't see any intermediate files I can refer to for debugging.

What is the full shell command that's being run here? I thought it was supposed to be something like the following, but the following code is successful on the same bam file when I run it separately.

samtools view -b -@50 $bam | bedtools bamtobed -i stdin | awk -v OFS="\t" '{if ($6=="-"){print $1,$2,$3,$4,$5,$6} else if ($6=="+") {print $1,$2,$3,$4,$5,$6}}' | bedtools genomecov -bg -5 -i stdin -g $chromSizes | bedtools sort -i stdin

Thanks in advance, and please let me know if you need any additional info from me.