"prob" parameter in dataset source

insilicomedicine / GENTRL

Generative Tensorial Reinforcement Learning (GENTRL) model

611 stars 218 forks source link

Hi, this 'prob' parameter controls the frequency of the sampled data from the datasets. For example, you have two datasets, with two different probabilities, 0.8 and 0.2 respectively (As the sum should be 1)

A = gentrl.MolecularDataset(sources=[{
          'path':'A.csv',
          'smiles': 'SMILES',
          'prob': 0.8,
          'plogP' : 'plogP',
           }], 
        props=['plogP'])

B = gentrl.MolecularDataset(sources=[{
          'path':'B.csv',
          'smiles': 'SMILES',
          'prob': 0.2,
          'plogP' : 'plogP',
           }], 
        props=['plogP'])

So, when you train using these dataset the 80% of training data will be from dataset A. And 20% of training data will be from dataset B.

So, basically in this example it is kept 1 so that 100% of the training data is from the train_plogp_plogpm.csv

insilicomedicine / GENTRL

"prob" parameter in dataset source #13