pgmikhael / clipzyme

Reaction-Conditioned Virtual Screening of Enzymes
Apache License 2.0
18 stars 1 forks source link

Request for the Splitted Dataset #4

Closed zw-SIMM closed 3 months ago

zw-SIMM commented 3 months ago

Thanks for your excellent work and contributions.

I noticed that when splitting the dataset by rule_id, the results I obtained differ slightly from the numbers mentioned in your paper. It may due to the random splitting?

My splitting:

TRAIN DATASET CREATED FOR ENZYMEMAP_REACTION_GRAPH. 
        * Number of samples: 34160
        * Number of reactions: 12568
        * Number of proteins: 9751
        * Number of ECs: 2248

DEV DATASET CREATED FOR ENZYMEMAP_REACTION_GRAPH. 
        * Number of samples: 7268
        * Number of reactions: 2668
        * Number of proteins: 1962
        * Number of ECs: 464

TEST DATASET CREATED FOR ENZYMEMAP_REACTION_GRAPH. 
        * Number of samples: 4614
        * Number of reactions: 1546
        * Number of proteins: 1399
        * Number of ECs: 317

The splitting in the paper: image

To ensure that we can accurately reproduce your experimental results, would you mind providing the specific dataset splitting ids or the exact splitted files used in the paper?

Thank you very much for your assistance!

zw-SIMM commented 3 months ago

Sorry, I apologize for bothering you. I found you have provided file files/cached_enzymemap.p , which can totally reproduce your splitting.

TRAIN DATASET CREATED FOR ENZYMEMAP_REACTION_GRAPH. 
        * Number of samples: 34427
        * Number of reactions: 12629
        * Number of proteins: 9794
        * Number of ECs: 2251

DEV DATASET CREATED FOR ENZYMEMAP_REACTION_GRAPH. 
        * Number of samples: 7287
        * Number of reactions: 2669
        * Number of proteins: 1964
        * Number of ECs: 465

TEST DATASET CREATED FOR ENZYMEMAP_REACTION_GRAPH. 
        * Number of samples: 4642
        * Number of reactions: 1554
        * Number of proteins: 1407
        * Number of ECs: 319