mcs07 / PubChemPy

Python wrapper for the PubChem PUG REST API.
http://pubchempy.readthedocs.io
MIT License
381 stars 106 forks source link

how to download several compounds at once? #26

Closed UnixJunkie closed 2 years ago

UnixJunkie commented 6 years ago

Hello, Thanks for this very useful library! Is it possible to download at once several (many?) compounds using their inchis? I just know how to do one by one, but this is quite slow. Thanks, F.

UnixJunkie commented 6 years ago

related to https://github.com/mcs07/PubChemPy/issues/24

UnixJunkie commented 6 years ago

related to https://github.com/mcs07/PubChemPy/issues/25

Akhila-Mettu commented 2 years ago

may I know the answer to this? How to download all structures from an assay file # #

UnixJunkie commented 2 years ago

Maybe you can try using my script:

#!/usr/bin/env python3
#
# extract all molecules with a pchembl_value
# for a given ChEMBL target id

import sys
import statistics

from chembl_webresource_client.new_client import new_client
activities = new_client.activity

target_id = sys.argv[1]

pIC50s = activities.filter(target_chembl_id = target_id,
                           pchembl_value__isnull = False).only(
                               ['canonical_smiles',
                                'molecule_chembl_id',
                                'pchembl_value'])
output_fn = target_id + '.smi'
with open(output_fn, 'w') as output:
    print('pIC50s (w/ dups): %d' % len(pIC50s))
    smi2pIC50 = {}
    for x in pIC50s:
        cano_smi = x['canonical_smiles']
        chembl_id = x['molecule_chembl_id']
        pIC50 = float(x['pchembl_value'])
        # look for duplicates by canonical smiles; if there are some;
        # use the median pIC50
        if cano_smi in smi2pIC50:
            # dup
            (name, prev_vals) = smi2pIC50[cano_smi]
            prev_vals.append(pIC50)
            smi2pIC50[cano_smi] = (name, prev_vals)
        else:
            smi2pIC50[cano_smi] = (chembl_id, [pIC50])
    # if there are dups, compute median pIC50
    print('pIC50s (no dups): %d' % len(smi2pIC50))
    for cano_smi in smi2pIC50.keys():
        name, p_chembls = smi2pIC50[cano_smi]
        p_chembl = statistics.median(p_chembls)
        print('%s\t%s_%s' % (cano_smi, name, p_chembl),
              file = output)
Akhila-Mettu commented 2 years ago

Hi, thanks for your reply. But, I asked for downloading all structures in sdf from Pubchem assay aid? Do you have any idea about that?

On Thu, Mar 10, 2022 at 8:07 PM Francois Berenger @.***> wrote:

Closed #26 https://github.com/mcs07/PubChemPy/issues/26.

— Reply to this email directly, view it on GitHub https://github.com/mcs07/PubChemPy/issues/26#event-6221740496, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUP3HQ37XAIRH6HQW6ABCG3U7KMFHANCNFSM4EIMUMTA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

-- Regards. Dr. Akhila

UnixJunkie commented 2 years ago

Point and click in the pubchem web interface; they have a download button and you can select SDF or SMILES.