hippo-yf / bsgenova

An accurate, robust, and fast genotype caller for bisulfite-sequencing data
GNU General Public License v3.0
2 stars 2 forks source link

Output error - can't get VCF-stats from vcf.gz file #2

Open desmodus1984 opened 2 months ago

desmodus1984 commented 2 months ago

Hi,

I tried running bsgenova on some samples,and I tried checking the files for stats like SNPS multiallelic InDels, etc, and I got an error.

vcf-stats V02055.bsg.vcf.gz The version "4.4" not supported, assuming VCFv4.2 Empty fields in the header line, the column 6 is empty, removing.

Then, I tried checking the file because of the error of line 6, and it seems that there is an extra space or extra tab.

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT V02055.bsg

Compare the one from cgmaptools;

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001

As you can see, the output from bsgenova, creates a wrong header with them cause the file to be incompatible or unreadable by either bcftools or even VCFtools - since the latest only uses VCFv4.2 and the output is VCFv4.4.

Please fix this and if possible, suggest how to fix this error.

hippo-yf commented 2 months ago

image This an example output of bsgenova of vcf format, it doesn't produce extra space or tab. I will later check the compatiablity with vcftools. Could you send me your vcf file (including some lines) ?

hippo-yf commented 2 months ago

image This an example output of bsgenova of vcf format, it doesn't produce extra space or tab. I will later check the compatiablity with vcftools. Could you send me your vcf file (including some lines) ?

I have found an extra tab in line 14. Now it has been fixed.

desmodus1984 commented 2 months ago

Hi,

image

Could you please tell me how to fix it, so I don't have to run all the samples again, please?

Thank you.

hippo-yf commented 2 months ago

image In line 14, after between "ALT" and "QUAL" of vcf file, you can decompress the .gz file, delete the extra tab (say in vim), and compress again.

desmodus1984 commented 2 months ago

I corrected that and then when I tried merging the VCF files it failed again. I used bcftools, and it failed; bcftools merge *.bsg.vcf.gz -Oz -o mrkdup.44.vcf.gz and the error message was: Error: Duplicate sample names (FORMAT), use --force-samples to proceed anyway.

Hope you can fix this.

hippo-yf commented 2 months ago

I have fixed the bugs of vcf format, I have test bcftools view/index/merge/stats commmands.
In addition, the new argument --sample-name is added (used in vcf file), you should check the sample names in vcf file whether they are duplicated.
And, the keys GQ and GQH are modifed.

desmodus1984 commented 2 months ago

Is the software bugs fixed/updated in conda, so that I can update the conda package and run bsgenova again?

hippo-yf commented 2 months ago

bsgenova is not released in conda. In any python environemnt with the bsgenova dependencies, including venv and conda, you can just git clone and run bsgenova