ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

RuntimeError: VCF file is missing mandatory header line ("#CHROM...") #98

Closed ChenJuiYANG closed 6 months ago

ChenJuiYANG commented 6 months ago

I got an error message when I run pixy.

"RuntimeError: VCF file is missing mandatory header line ("#CHROM...")"

However, I am pretty sure my input vcf file have the "#CHROM.." line in the header. In fact, I ran pixy parallelly with different small chunks. The input files were made at the same time with the same protocol. Most runs were fine, but there about 3 (of 56) runs report the error message above. In those failed runs, pixy gave some extra warning message: "_[W::hts_idxload2] The index file is older than the data file: [CSI INDEX FILE PATH]"

I have both csi and bai index file for each vcf file. The csi index is produced at the same time with vcf files, not older. I think the error is because pixy tried to read csi index rather than tbi index in those failed runs, and pixy may have some trouble in reading csi format index. So I moved all csi index to other place and ran pixy again. Interestingly, it worked well.

If this issue has been solved in the newer version, feel free to close the issue.

Command used:

pixy --stats pi fst dxy \
--vcf ${vcf} --output_folder pixyOutput/ --output_prefix pixy.${chr}.out \
--populations ID_SP.list --chromosomes "${chr}" --bypass_invariant_check yes \
--window_size 100000 --n_cores 4 

Version: 1.2.5.beta1

OS: linux

ksamuk commented 6 months ago

Hi There,

Yes, .csi indexes are not currently supported, although are a planned feature.

Cheers,

Kieran

ChenJuiYANG commented 6 months ago

Thanks for the prompt reply.

I mean this should be a bug. If .csi index is not supported, it should always use/find .tbi index when both index files exist. Otherwise, error could happen sometimes. As I mentioned, most tasks worked well but sometimes just not.

Chen-Jui

ksamuk commented 6 months ago

Hi Chen-Jui,

I agree, although this is bit of an unusual edge case, since usually there would be a single index. I'd be happy to review a pull request if you'd like to implement a fix for this feature.

Cheers,

Kieran

ChenJuiYANG commented 6 months ago

Hi Kieran,

I agree this is probably a rare case, and the error could simply be solved by move the .csi index somewhere else. I think I will just leave this issue here, and hopefully someone who has the same problem in the future may find this discussion.

The reason I have both format of index is because I used bcftools to concatenate the invariant parts and the variant parts. The bcftools can produce an index file at the same time while concatenating vcf files, while it is in .csi format by default. Unfortunately, some programs, including pixy, don't support .csi format, so I re-indexed the vcf files with .tbi format and left the .csi index in the same place.

Chen-Jui