longlongman / DESERT

Zero-Shot 3D Drug Design by Sketching and Generating (NeurIPS 2022)
30 stars 9 forks source link

issue with get_training_data #4

Closed orgw closed 1 year ago

orgw commented 1 year ago

after get_fragment_vocab i get 2 pkl files

at get_training_data i put in BRICS_RING_R.vocab.pkl path..

there's an error

Traceback (most recent call last):

frag_idx = vocab[frag_smi][2]

TypeError: 'Mol' object is not subscriptable

orgw commented 1 year ago

fixed after inserting your provided vocab file. After examinig the vocab.pkl file i made, it seems that it only has mol as value. What would be the error

orgw commented 1 year ago

i have a few more questions... thanks you for answering beforehand. 1.in shape pretraining what's no_regression meaning?

  1. what's shard and pocket??
  2. why is there a vocab path per training, valid, and test? in the link there's only one vocab pickle file thanks
longlongman commented 1 year ago

Some questions are not so clear (maybe give me more details), thus I can only answer the following questions:

Q: in shape pretraining what's no_regression meaning? A: As mentioned In appendix 1.2 and 2.2, we discretize two continuous variables (translation vector and rotation quaternion) and convert the regression problem to a classification problem. 'no_regression' means none of these two variables are handled as a regression problem.

Q: What's the shard and pocket? A: A shard is a slice (500K molecules) of the total dataset (100M molecules). Because the total dataset is too large, we can only train the model one shard at a time. The pocket is the terminology of protein science, which stands for a local protein region drugs binding to.

Q: Why is there a vocab path per training, valid, and test? in the link there's only one vocab pickle file A: They (training, valid, and test) share the same vocabulary.