Closed arodel21 closed 10 months ago
Hi Andrea, you don’t need to do any additional filtering by read length. The 2114 bp is the size of the model input sequence. Can you describe how you are performing read alignment and other preprocessing of the reads? Specifically how the reads are being shifted on the + and - strands?
Hi Surag!
Thanks for the reply.
I realized I was doing something wrong. Now I am only following the instructions on the tutorials, which means:
bedtools slop -i $blacklist -g $chrom_sizes -b 1057 > temp.bed
bedtools intersect -v -a $peaks -b temp.bed > peaks_no_blacklist.bed
chrombpnet prep nonpeaks -g $genome -p peaks_no_blacklist.bed -c $chrom_sizes -fl $splits -br $blacklist -o output
chrombpnet bias pipeline \
-ibam $bam_neg \
-d "ATAC" \
-g $genome \
-c $chrom_sizes \
-p peaks_no_blacklist.bed \
-n output_negatives.bed \
-fl fold_0.json \
-b 0.5 \
-o bias_model/ \
-fp k562 \
But I am still getting the same error.
Hi Andrea,
How did you get your bam files from fastq's? Are you using raw bam files or did you do any additional preprocessing on them?
Hi Anusri,
I did not generate the data myself and I did not process it anyhow. I was given the BAM files just as they are. I could send you a subsample of it so maybe its easier to detect if it has any issues?
What type of pre-processing could cause this error?
Thanks.
Can you run the following commands on your bam and share the atac_no_shift.png?
samtools view -b $atac_in_bam chr20 > out.bam
samtools view -b -@50 out.bam | bedtools bamtobed -i stdin | awk -v OFS="\t" '{if ($6=="-"){print $1,$2,$3,$4,$5,$6} else if ($6=="+") {print $1,$2,$3,$4,$5,$6}}' | bedtools genomecov -bg -5 -i stdin -g $chrom_sizes | bedtools sort -i stdin > tmp2
bedGraphToBigWig tmp2 $chrom_sizes unstranded.bw
python build_pwm_from_bigwig.py -i unstranded.bw -g $genome -o atac_no_shift.png -c "chr20" -cz $chrom_sizes
this is the build_pwm_from_bigwig.py - https://github.com/kundajelab/chrombpnet/blob/master/chrombpnet/helpers/preprocessing/analysis/build_pwm_from_bigwig.py
Your bam files might have been pre-shifted to a non-standard shift
@panushri25 the error is from here:
The issue is not non-standard shift (it would have printed those values otherwise). Instead, different Tn5 motifs we use as reference are suggesting different shifts. This could suggest more severe QC issues or something worse, so building PWM would make sense.
Going ahead we should run build PWM automatically if the above fails, so that people can just post the png directly.
Can you run the following commands on your bam and share the atac_no_shift.png?
samtools view -b $atac_in_bam chr20 > out.bam samtools view -b -@50 out.bam | bedtools bamtobed -i stdin | awk -v OFS="\t" '{if ($6=="-"){print $1,$2,$3,$4,$5,$6} else if ($6=="+") {print $1,$2,$3,$4,$5,$6}}' | bedtools genomecov -bg -5 -i stdin -g $chrom_sizes | bedtools sort -i stdin > tmp2 bedGraphToBigWig tmp2 $chrom_sizes unstranded.bw python build_pwm_from_bigwig.py -i unstranded.bw -g $genome -o atac_no_shift.png -c "chr20" -cz $chrom_sizes
this is the build_pwm_from_bigwig.py - https://github.com/kundajelab/chrombpnet/blob/master/chrombpnet/helpers/preprocessing/analysis/build_pwm_from_bigwig.py
This is the output atac_no_shift.png for my positive label
And this is the output png for my negative label
Sorry what do you mean by positive label and negative label?
So I have two BAM files, one is the positive label and the other is the negative. Both of them are different experiments but with the same background. So I am trying to build a bias model from the negative label (background) to correct the bias in the positive label.
Sorry can you try this new command set -
samtools view -b $atac_in_bam chr20 > out.bam
samtools view -b -F796 -@50 out.bam | bedtools bamtobed -i stdin | awk -v OFS="\t" '{if ($6=="-"){print $1,$2,$3,$4,$5,$6} else if ($6=="+") {print $1,$2,$3,$4,$5,$6}}' | bedtools genomecov -bg -5 -i stdin -g $chrom_sizes | bedtools sort -i stdin > tmp2
bedGraphToBigWig tmp2 $chrom_sizes unstranded.bw
python build_pwm_from_bigwig.py -i unstranded.bw -g $genome -o atac_no_shift.png -c "chr20" -cz $chrom_sizes
I still dont fully understand what the "positive" and "negative" means, can you provide a bit more context
Sorry for the confusion. It is just that I have to train two models for each of my conditions.
Here are the atac_no_shift.png
Positive class
Negative class
Is the positive class dataset a ATAC-seq dataset, it doesnt look like it
It is ATAC-seq
@panushri25 the error is from here:
The issue is not non-standard shift (it would have printed those values otherwise). Instead, different Tn5 motifs we use as reference are suggesting different shifts. This could suggest more severe QC issues or something worse, so building PWM would make sense.
Going ahead we should run build PWM automatically if the above fails, so that people can just post the png directly.
You mean the position weight matrix? How should I go ahead with this?
I think I found the issue...
First, one of the models was using a BAM file and a BED file where both were aligned to different genome versions. And second, my peaks didn't include the summits...
From here, what is the summit? Is it (end - start) / 2 or start + (end - start) / 2?
Thanks again for all the support.
Hello all,
Thank you so much for your work on ChromBPNet!
I was wondering if you could help me to understand an error I am getting while running the pipeline.
This error happens during the ATAC peaks shifting. By the way, I am not sure of understanding much why 2114 bp...
I would very much appreciate it if you could help me understand this error a bit more. Thank you!