nhansen / SVanalyzer

Tools for the analysis of structural variation in genomes
http://svanalyzer.readthedocs.io/
Other
76 stars 14 forks source link

NA vs NaN #9

Closed proinde closed 4 years ago

proinde commented 4 years ago

Not sure if this is an issue with your code or with pyvcf. I'm reading your output files in with pyvcf, and it's throwing an error because you have INFO entries set to be "NA" in fields that are numerically typed. Evidently, pyvcf does not accept that as a placeholder for non-values in either int or float fields (it states that it cannot convert "NA" to a float), but it does accept "NaN" when I sed replace all the NA to NaN. The VCF specification does describe how to implement NaN values in BCF format, but it is, to my surprise, not actually mentioned in regards to VCF. I've looked around a few other packages for handling VCF files and see a wild mixture of NaN and NAN but I only ever see NA specifically in fields that are string type.

nhansen commented 4 years ago

Thanks again for reporting this! There's an easy fix--I've changed the "NA" reported for the maximum cluster distances to "NaN". Please let me know if you see this issue with other SVanalyzer tools, and I'll keep an eye out for it in the future too.

proinde commented 4 years ago

Sure thing! Thanks again!