Closed mufernando closed 4 months ago
Thanks for the bug report @mufernando!
I think the problem is with your call_PL
field:
DEBUG Initialised <zarr.core.Array '/call_PL' (24739, 746, 66) int32>
Your PL field leads to an inner dimension of 66! :face_with_open_eyes_and_hand_over_mouth:
So, each variant chunk of this array is 10000 746 66 * 4 which is more than 2147483647. The immediate bug here then is that we're not giving a usable error message. More generally, PL fields are a major problem (#185 and linked discussion) but we know how they should be dealt with.
In the short term, I'd suggest creating a schema and dropping the call_PL field (assuming you're not using it):
vcf2zarr mkschema chrY_mcc-746-kray-filtered.exploded > schema.json
... edit JSON to remove delete the call_PL field
vcf2zarr encode chrY_mcc-746-kray-filtered.exploded -s schema.json chrY_mcc-746-kray-filtered.zarr
I confirmed removing the PL field it works! For future reference, you can remove it from the schema using jq
:
vcf2zarr explode example.vcf.gz example.exploded
vcf2zarr mkschema example.exploded | jq 'del(.fields[] | select(.name == "call_PL"))' > example.json
vcf2zarr encode example.exploded example.zarr --schema example.json
I just merged a fix which should raise a more helpful error message @mufernando - would you mind trying it out on your data please? The message will need some links to as-yet unwritten docs about PL fields, but hopefully it's pointing people in the right direction.
I have a 75Mb VCF that I tried converting using
vcf2zarr
CLI.Following the instructions, I first created the intermediate ICF using
which worked out fine.
Then, I tried converting to the final vcfzarr using
encode
:This is the verbose output I get https://gist.github.com/mufernando/805a39a636c71b21c1b25a8ff6417e49
Ideas on what might be going on here? I'm happy to share the VCF.