mills-lab / dinumt

13 stars 8 forks source link

Faidx warnings + missing VCF REF due to removal of 'chr' prefix even when using option --ucsc #2

Closed johnstonmj closed 3 years ago

johnstonmj commented 4 years ago

I was getting many warnings from faidx: [W::fai_get_val] Reference 11:100144843-100144843 not found in FASTA file, returning empty sequence [faidx] Failed to fetch sequence in 11:100144843-100144843 [W::fai_get_val] Reference 21:21708154-21708154 not found in FASTA file, returning empty sequence [faidx] Failed to fetch sequence in 21:21708154-21708154

These errors are because my reference contains "chr11" instead of "11".

Additionally, in the final VCF all REF entries are "N" because faidx cannot retrieve the sequence at these positions.

Option --ucsc fixes this for most of the bam processing, but not for generating the VCF.

The problem was that line 308 of dinumt.pl removes the chr prefix with: $chrom =~ s/chr//g;

I edited this to:

        unless ( $opts{ucsc} || $opts{ensembl} ) {  
            $chrom =~ s/chr//g;
        }

This fixes the faidx warnings and modifies the output VCF file so that 'CHROM' entries include the 'chr' prefix and 'REF' entries are no longer exclusively 'N'.

WeichenZhou commented 3 years ago

Thank you, @johnstonmj we tested it and made some edits in our scripts.

Best, Arthur

raimondsre commented 2 years ago

Hi, same issue is still present if --ucsc not added.

Best Raimonds