volkamerlab / opencadd

A Python library for structural cheminformatics
https://opencadd.readthedocs.io
MIT License
89 stars 18 forks source link

Different results from different KLIFS APIs #105

Closed schallerdavid closed 2 years ago

schallerdavid commented 2 years ago

Hey @dominiquesydow ,

by accident I bumped into an issue with two available API's of KLIFS. The inconsistency might come from KLIFS, but I thought I'd post it here first before I message Albert.

When using opencadd in version 0.1.1 and version 0.2.0 you get different results:

Steps to reproduce

from opencadd.databases.klifs import setup_remote
remote = setup_remote()
structures = remote.structures.all_structures()
13896 in structures["structure.klifs_id"].unique()
len(structures)

Expected behavior

Observed behavior

Can you please double check this inconsistency comes from KLIFS?

Thanks, David

dominiquesydow commented 2 years ago

Hi @schallerdavid,

Interesting.

Just checked the two websites and it seems to be something on the KLIFS site: https://klifs.net/details.php?structure_id=13896 > returns a structure https://dev.klifs.net/details.php?structure_id=13896 > returns no structure

schallerdavid commented 2 years ago

I did a set comparison of available structure klifs ids. Here are the missing values that are not returned in the new version:

13854, 13855, 13856, 13857, 13858, 13859, 13860, 13861, 13862, 13863, 13864, 13865, 13866, 13867, 13868, 13869, 13870, 13871, 13872, 13873, 13874, 13875, 13876, 13877, 13878, 13879, 13880, 13881, 13882, 13883, 13884, 13885, 13886, 13887, 13888, 13889, 13890, 13891, 13892, 13893, 13894, 13895, 13896, 13897, 13898, 13899, 13900, 13901, 13902, 13903, 13904

Those are actually the highest numbers of structure klifs ids. So it might be that the new API is simply updated a little later?

dominiquesydow commented 2 years ago

Oh yes, that seems likely!