lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
478 stars 132 forks source link

When AF=1, not all positions have changed #178

Closed LauraVP1994 closed 3 years ago

LauraVP1994 commented 3 years ago

Dear

This tool works great and is very fast, however I stumbled on a problem. I want to translate my entire bam file, so that it would contain at each position one nucleotide at 100% and the rest at 0%. However, at some positions this is not the case so I was wondering if you could help.

image

I added all of the files in this temporary link: https://we.tl/t-DLbWlMo8qB

The commands used are:

java -jar biostar404363.jar -o ERR5059083_REPL.bam -p SNP_1.vcf ERR5059083.bam
java -jar biostar404363.jar -o ERR5059083_REPL_2.bam -p SNP_2.vcf ERR5059083_REPL.bam
java -jar biostar404363.jar -o ERR5059083_REPL_3.bam -p SNP_3.vcf ERR5059083_REPL_2.bam
java -jar biostar404363.jar -o ERR5059083_REPL_4.bam -p SNP_4.vcf ERR5059083_REPL_3.bam
lindenb commented 3 years ago

sorry, I won't have the time to download those files and to look at your problem. Please try to narrow the problem, show me the VCF lines and the resulting SAM lines please.

LauraVP1994 commented 3 years ago

For each position I want to change the other existing nucleotides to one, so I made 4 vcf files with for each position the change.

To comply with the same position as the image (position 1023): I added in vcf file 1:

CHROM POS ID REF ALT QUAL FILTER INFO

Reference 1023 . A C . . AF=1

vcf file 2: Reference 1023 . T C . . AF=1

vcf file 3: Reference 1023 . G C . . AF=1

vcf file 4: Reference 1023 . N C . . AF=1

However, as you can see in the image, there remain 16 A's (at some positions this can go to sometimes 70 of a non-desired nucleotide).

I extracted the sorted bam reads at this position: filtered.zip

lindenb commented 3 years ago

your bam is not a bam but a SAM and the header is missing. I cannot work with this.

LauraVP1994 commented 3 years ago

I used this command: samtools view ERR5059083_REPL_4_sorted.bam Reference:1022-1024 > filtered.bam

I have never extracted reads like this, so if you could assist in what you would need more in particular, this would help me to provide this to you.

Kind regards Laura

lindenb commented 3 years ago

samtools view -O BAM -o filtered.bam ERR5059083_REPL_4_sorted.bam Reference:1022-1024

LauraVP1994 commented 3 years ago

filtered.zip

lindenb commented 3 years ago

Hi,

thank you for the bug report. There was a major bug in a loop, that was triggered when there was one INDEL in the read. I hope it was fixed in https://github.com/lindenb/jvarkit/commit/e376331e69de7ac7b5c91b62768d7019660b78d9

Can you please test the new version to see if it works now ?

LauraVP1994 commented 3 years ago

Thank you! It seems to be working now.