Open eselimnl opened 2 days ago
Thanks @eselimnl, I'll take a look asap :)
I couldn't add my GOOGLE_CREDENTIALS that is needed by the workflow, please advise if you would like me to do this.
Credentials are supplied automatically via a secret, however this is disabled for PRs coming from a fork. Basically, it would be easier if you recreate this PR from a branch within the malariagen/malariagen-data-python repo, rather than from a fork, then all checks will run properly. I've given you write access to this repo so you should be able to do that now.
Variant/altlen Field Handling: We observed that the variant/altlen field contains a single value rather than one per ALT, despite the number of ALT being set to 6. The cause of this behavior is unclear, but it may relate to the comment “special case (for altlen, number depends on ALT)” in scikit-allel. Since it is possible to calculate this value from
REF
andALT
, we are not concerned about it now. Therefore, this field is excluded fromdata_vars
after reading the Pf8 zarr file.
Sorry about this, don't know what's going on there. Seems like a reasonable approach for now, but happy to investigate if there's a problem upstream in scikit-allel.
Hi @alimanfoo, here is the draft PR for Pf8(). I couldn't add my GOOGLE_CREDENTIALS that is needed by the workflow, please advise if you would like me to do this.
Pf8(): The Pf8() function is built upon the
Pf7()
template, and should not disrupt the functionality ofPf7()
(verified via test_pf7_integration.py).Sanger S3 Configuration: Updates have been made in malariagen_data/util.py to support data access on the Sanger S3 storage. For privacy reasons, the bucket name is not included in the PR (such as in the tests/test_pf8_integration.py).
Variant/altlen Field Handling: We observed that the variant/altlen field contains a single value rather than one per ALT, despite the number of ALT being set to 6. The cause of this behavior is unclear, but it may relate to the comment “special case (for altlen, number depends on ALT)” in scikit-allel. Since it is possible to calculate this value from
REF
andALT
, we are not concerned about it now. Therefore, this field is excluded fromdata_vars
after reading the Pf8 zarr file.