Closed project-defiant closed 1 month ago
For context, the original scope of this function was unnested arrays. From the docs:The function assumes the array columns have the same schema. Otherwise, the function will fail.
.
The issue is present in the VEP parsing step because we are unioning arrays of structs.
For context, the original scope of this function was unnested arrays. From the docs:
The function assumes the array columns have the same schema. Otherwise, the function will fail.
. The issue is present in the VEP parsing step because we are unioning arrays of structs.
To me it says that the schema
has to be the same, there is no mention about the non-nested structs.
Furthermore the fix will only work on +1 nesting level still.
Describe the bug
safe_array_union
function fromgentropy
can not merge arrays of structs with differently ordered fields. In example array<struct<a,b>>with
array<struct<b,a>>Observed behaviour Merging arrays should not raise
AnalysisException
issueExpected behaviour
AnalysisException
issue is raisedTo Reproduce
raises
Additional context This issue was discovered when trying to merge
transcriptConsequences
fields between vep based VariantIndex and gnomad based VariantIndex - see dataproc job and conversation about the schema issues in https://github.com/opentargets/gentropy/pull/790#issuecomment-2378784157