Closed Dinxin closed 5 years ago
Tokenization is done here: https://github.com/jrwnter/cddd/blob/8c1f90d1d927962658be6b9086f2cb5fa3403ae9/cddd/input_pipeline.py#L139-L150 The SMILES string is basically splitted in individual characters such as C,N,Cl,1,[, etc. Afterwards, each of this characters is one-hot encoded.
Tokenization is done here: https://github.com/jrwnter/cddd/blob/8c1f90d1d927962658be6b9086f2cb5fa3403ae9/cddd/input_pipeline.py#L139-L150 The SMILES string is basically splitted in individual characters such as C,N,Cl,1,[, etc. Afterwards, each of this characters is one-hot encoded.