Batch querying - Githubissues

Fnyasimi commented 2 years ago

Hi @krassowski thanks for the easy API!

I was wondering if there is a way to query in batches. I have a list of 1000 coordinates I want to query for rsids. I would have done it in a for-loop but the API is set to limit to 3 queries per second which becomes impossible to implement.

My main question is there a method I can use to query the 1000 coordinates to get their rsids without using a loop? I believe this would be efficient and faster besides bypassing the rate limit set by NCBI.

krassowski commented 2 years ago

Thank you for trying out this package. I don't see an easy way out; it is not a limitation of the easy_entrez package, but it is just the way the Entrez API was designed: the EFetch, ELink and ESummary endpoints do support multi-item requests (and batching is supported in easy_entrez if larger than allowed collections are to be used; see in_batches_of in the reference and the demo notebook), but the ESearch endpoint does not - it accepts one term only.

I guess it is because searching is the most expensive of the operations. I would either accept that it will take time, or use a different tool (something from vcf/bcf tools family or new NCBI variation API: https://api.ncbi.nlm.nih.gov/variation/v0/ SPDI rsid endpoint).

Fnyasimi commented 2 years ago

@krassowski Thank you for the feedback I will explore this further.

krassowski / easy-entrez

Batch querying #5