x-atlas-consortia / ubkg-etl

A framework that combines data from the UMLS with assertions from other data sources into a set of CSV files that can be imported into neo4j to build a Unified Biomedical Knowledge Graph (UBKG)
MIT License
3 stars 0 forks source link

Modify RefSeq extract to be in blocks of 50K #130

Closed AlanSimmons closed 6 months ago

AlanSimmons commented 8 months ago

Statement of Problem

The program that uses the NCBI EUtilities API to extract RefSeq data from the gene database fails with 500 errors. This appears to be related to the number of calls being made.

The total number of RefSeq entries for Ensembl entries in gene is currently over 194K.

Solution

Break the extract into groups of 50K.