The Alliance Gene to Phenotype ingest needs to filter for only genes within a file that also contains other object to phenotype associations (Alleles at least). Currently, we're handling this by running a jq command to filter the Alliance as a hardcoded step after the download.
We can remove this step by either adding yaml-only support for building maps from json files to Koza, or by using the custom mapping feature of Koza. Since this is the only instance we've seen so far of wanting make a map from a json file, and since we're mostly de-emphasizing the use of mapping files, I think it would probably make more sense to tackle this as a custom mapping file.
The shell command that we're currently running to make the mapping file is zcat data/alliance/BGI_*.gz | jq '.data[].basicGeneticEntity.primaryId' | gzip > data/alliance/alliance_gene_ids.txt.gz
I believe this means we'd simply need to make a custom map that only adds keys.
The Koza repository has an example of creating a custom map. I think that if a .py file exists to match the .yaml of the map config, the python code will be executed to create the map. For a (non-JSON) example, check out:
The Alliance Gene to Phenotype ingest needs to filter for only genes within a file that also contains other object to phenotype associations (Alleles at least). Currently, we're handling this by running a jq command to filter the Alliance as a hardcoded step after the download.
We can remove this step by either adding yaml-only support for building maps from json files to Koza, or by using the custom mapping feature of Koza. Since this is the only instance we've seen so far of wanting make a map from a json file, and since we're mostly de-emphasizing the use of mapping files, I think it would probably make more sense to tackle this as a custom mapping file.
The shell command that we're currently running to make the mapping file is
zcat data/alliance/BGI_*.gz | jq '.data[].basicGeneticEntity.primaryId' | gzip > data/alliance/alliance_gene_ids.txt.gz
I believe this means we'd simply need to make a custom map that only adds keys.
The Koza repository has an example of creating a custom map. I think that if a .py file exists to match the .yaml of the map config, the python code will be executed to create the map. For a (non-JSON) example, check out:
https://github.com/monarch-initiative/koza/blob/main/examples/maps/custom-entrez-2-string.yaml https://github.com/monarch-initiative/koza/blob/main/examples/maps/custom-entrez-2-string.yaml