Open Tangbbmc opened 4 months ago
Is the pubchem CID the end of the story, or is the eventual interest for instance a SMILES/InChI string, a .sdf, or even to check for other CAS numbers relating to the same structure? If this were the case, cirpy (equally initiated by Mark Swain, equally to get in touch with NIH services) might be worth to consider.
As an illustration of how this might help you:
>>> import cirpy
>>> cirpy.resolve("Aspirin", "cas")
['50-78-2', '11126-35-5', '11126-37-7', '2349-94-2', '26914-13-6',
'98201-60-6']
>>> cirpy.resolve("50-78-2", "smiles")
'CC(=O)Oc1ccccc1C(O)=O'
>>> cirpy.resolve("50-78-2", "cas")
['50-78-2', '11126-35-5', '11126-37-7', '2349-94-2', '26914-13-6',
'98201-60-6']
i.e. a request which provides an identifier by you, and states the desired output format.
Two additional cents:
to keep PubChem's database accessible to many/reduce chances of a denial-of-service attack, the rate of permitted requests per unit of time is limited. (This is similar to other API, e.g. the one by GitHub.) With a large list of entries to check, you might consider an authenticated access with a token by the database for higher performance.
because CAS can retract registry numbers at any time at their discretion (and actually does so, see e.g. its public commonchemistry.cas.org of about 500k records), it were be sensible to wrap the queries in a try/escape clause.
Ok. Thank you for your kind suggestions!
---- Replied Message ---- From @.> Date 05/23/2024 17:50 To @.> Cc @.>@.> Subject Re: [mcs07/PubChemPy] Ask for help about how to find Pubchem CID of many compounds by CAS numbers (Issue #86)
Two additional cents:
to keep PubChem's database accessible to many/reduce chances of a denial-of-service attack, the rate of permitted requests per unit of time is limited. (This is similar to other API, e.g. the one by GitHub.) With a large list of entries to check, you might consider an authenticated access with a token by the database for higher performance.
because CAS can retract registry numbers at any time at their discretion (and actually does so, see e.g. its public commonchemistry.cas.org of about 500k records), it were be sensible to wrap the queries in a try/escape clause.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Dear authors, Could you share me with a scricpt that can query pubchem CID of thousands of compounds by CAS numbers (using a text file containing a list of CAS numbers) though pubchempy? Thank you! I am looking forward to your reply as soon as possible!