Open Hoeze opened 3 years ago
I ran through a test, and it looks like the VCF header was being parsed properly. My best guess is that because there is a period in the INFO ID, Spark is acting as if it's a nested struct. Without an error stack, I'm not quite sure what error you're encountering and where. Could you provide more detail?
I have to ask my colleague if he can reproduce it, but it was indeed an error because of Spark treating a field as a struct. The error message was caused by some expression like
'`INFO_AAChange.refGene`'
Hi, I just tried to load a VCF file where info fields' names contained points:
The following script fails on this VCF:
I'm using PySpark 3.1.1 with "io.projectglow:glow-spark3_2.12:1.0.0"