Open joeflack4 opened 2 years ago
For the sake of documentation, can you summarise exactly what makes the one that more reliable then the other? And also any reasons against making this change that come to mind.
Added EBI option unreliability reasoning to OP. Also, to expand on reason (2), I'm assuming that since OMIM uses the HGNC mappings from NCBI, I'm assuming they do this because they find it the best option, rather than as an arbitrary decision. I can't be absolutely sure that the HGNC mappings in this NCBI file are more recent than EBI's, but I think the fact that OMIM uses it makes it more justifiable.
This file from NCBI also has HGNC::OMIM mappings, but I think it best to get those from OMIM. So to recap, the HGNC_ID::OMIM mappings will come from OMIM, and the HGNC_ID::HGNC_Symbol mappings will come from NCBI.
Ok, I trust your judgement! Thanks for the explanation :)
I've been using
data/hgnc/hgnc_complete_set.txt
, which is provided by EBI. But it is not reliable.I should use
data/hgnc/Homo_sapiens.gene_info
, which can be obtained at https://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz. Ideally, I'll want to download a fresh copy of this and unzip it when the ingest runs. I should throw a warning if it fails to download, and use the cached version instead in that case.Reasons to make this change: