Request for the Splitted Dataset

pgmikhael / clipzyme

Reaction-Conditioned Virtual Screening of Enzymes

Apache License 2.0

18 stars 1 forks source link

Thanks for your excellent work and contributions.

I noticed that when splitting the dataset by rule_id, the results I obtained differ slightly from the numbers mentioned in your paper. It may due to the random splitting?

My splitting:

TRAIN DATASET CREATED FOR ENZYMEMAP_REACTION_GRAPH. 
        * Number of samples: 34160
        * Number of reactions: 12568
        * Number of proteins: 9751
        * Number of ECs: 2248

DEV DATASET CREATED FOR ENZYMEMAP_REACTION_GRAPH. 
        * Number of samples: 7268
        * Number of reactions: 2668
        * Number of proteins: 1962
        * Number of ECs: 464

TEST DATASET CREATED FOR ENZYMEMAP_REACTION_GRAPH. 
        * Number of samples: 4614
        * Number of reactions: 1546
        * Number of proteins: 1399
        * Number of ECs: 317

The splitting in the paper:

To ensure that we can accurately reproduce your experimental results, would you mind providing the specific dataset splitting ids or the exact splitted files used in the paper?

Thank you very much for your assistance!

TRAIN DATASET CREATED FOR ENZYMEMAP_REACTION_GRAPH. * Number of samples: 34427 * Number of reactions: 12629 * Number of proteins: 9794 * Number of ECs: 2251 DEV DATASET CREATED FOR ENZYMEMAP_REACTION_GRAPH. * Number of samples: 7287 * Number of reactions: 2669 * Number of proteins: 1964 * Number of ECs: 465 TEST DATASET CREATED FOR ENZYMEMAP_REACTION_GRAPH. * Number of samples: 4642 * Number of reactions: 1554 * Number of proteins: 1407 * Number of ECs: 319

pgmikhael / clipzyme

Request for the Splitted Dataset #4