openforcefield / qca-dataset-submission

Data generation and submission scripts for the QCArchive ecosystem.
Other
32 stars 6 forks source link

Potential dataset: Molecules from pharma partners in BindingDB #44

Open jchodera opened 5 years ago

jchodera commented 5 years ago

BindingDB contains a way to query molecule sets via patents the data was curated from, with a field populated with the name of the filing organization: https://www.bindingdb.org/bind/ByPatent.jsp

If we can grab this dataset and filter by the Organization field, we could easily create a new dataset that covers some well-studied areas of chemical space form our partners.

We may need to download the complete dataset to do this filtering.

jchodera commented 5 years ago

The Downloads page has an option to just download data curated from patents.

For example, BindingDB_USPatent_3D_2019m8.sdf.zip contains an Institution field that can be queried for our partners.