tfwillems / HipSTR

Genotype and phase short tandem repeats using Illumina whole-genome sequencing data
GNU General Public License v2.0
95 stars 31 forks source link

Contigs missing in output VCF #36

Closed nh13 closed 7 years ago

nh13 commented 7 years ago

Current behavior: only the contigs found in the input regions file is found in the output VCF file.

Expected behavior: all contigs in the input FASTA file should be written to the output VCF. Tools check that that contigs in the header match across VCFs (ex. GATK's ValidateVariants or Picard's GatherVcfs).

The write_contigs_to_vcf function in https://github.com/tfwillems/HipSTR/blob/master/src/fasta_reader.cpp#L65-L71 should write ALL chromosomes, not just the one's passed in!

nh13 commented 7 years ago

I also couldn't figure out why fastareader wants to have more than one index (why? are there multiple FASTAs? That seems wrong).

tfwillems commented 7 years ago

We used to allow a directory as input to --fasta instead of a single file. In that case, FastaReader wrapped a directory of indexed FASTA files and provided the same functionality as if they were all in a single indexed FASTA file. But the behavior has since been deprecated.