Closed project-defiant closed 2 years ago
hey @PROJECT-DEFIANT what version of glow are you using?
I think this issue was fixed in a recent pull request, please take a look at this line,
Hey, I have tried on glow.py 1.1.1
and also on glow.py 1.1.2
, but the issue seems to persists across all of them.
We saw the same issue before, where the rank or total is represented as a range ("6-8") for indels instead of an integer (6) as it is for SNPs. Converting the schema from IntegerType to StringType resolved it.
The way we figured it out was by deleting those INFO fields from an annotated VCF and then you can read those rows without getting null
for the annotation.
CNVs should be the same as indels...unless you have exposed another edge case in the schema. But the way you describe the problem it seems the same as what we have seen before.
Please confirm the version of the maven jars you are using for glow...
Thank You for response, It turned out I had outdated jar file
ah ok great, took us a couple of days to figure out that issue initially But it was never clearly documented on github, apologies for that
Dear, maintainers,
I am running glow with spark-standalone mode to save annotated with vep vcf files to parquet. I am using jupyter notebooks for this process.
I have found out, that some of the variants parsed by
are not preserving INFO_CSQ field from annotations. here is the output of
show
command on some of the variants:It was created by parsing ascat_vep.vcf file
After some digging I was able to found, that when both fields EXON & INTRON are not fitting the schema
as they have format
X-Y/Z
rather thanX/Z
. I have changed these fields in my file - saved under ascat_annot.vcf vcf_files are hereCould You check if You can reproduce it? Is it suppouse to happen?