nih-cfde / update-content-registry

Code and workflows for adding content to the content registry.
https://app-staging.nih-cfde.org/
BSD 3-Clause "New" or "Revised" License
0 stars 3 forks source link

[wip] filter genes by list of genes in the portal #77

Open raynamharris opened 1 year ago

raynamharris commented 1 year ago

This PR uses the new list of IDs to filter this gene inputs. This is very important because before there were are a handful of genes that are not in the portal that were perviously causing the upload command to crash. This is why #75 is so important.

Now, instead of an error, you get a warning message:

WARNING: requested input id ENSG00000204616 not found in ref_id_list WARNING: requested input id ENSG00000262302 not found in ref_id_list ...

working for 2 scripts.. need to add the others

raynamharris commented 1 year ago

stats generated using make log and cat logs/chunks.txt on this branch and the retrieve-ids branch

new length output_pieces_gene/01-appyter/ 19962

old length: output_pieces_gene/01-appyter/ 19971

this catches 9 genes without pages in the portal