maxhodak / keras-molecules

Autoencoder network for learning a continuous representation of molecular structures.
MIT License
519 stars 146 forks source link

add smilesparser.py for parsing SMILES strings. #41

Closed dakoner closed 7 years ago

dakoner commented 7 years ago

This will be useful if people want to train with charsets that are element names like 'Br'

dakoner commented 7 years ago

I've decided to put this in my own repo, as it's not strictly required for reproducing the paper's results.

pechersky commented 7 years ago

How do you envision utilizing the parser? Is there a set of termination symbols the parses implicitly encodes that you can use to generate a "charset"?

dakoner commented 7 years ago

I explicitly encoded a few terminals (see OrganicSymbol, AromaticSymbol, and ElementSymbol), but it really should use the explicit list of all elements: http://opensmiles.org/opensmiles.html spec explicitly lists all the terminals, so I think (using your SMILESDataGenerator as an example) you could use the list of terminals as your charset.

On Tue, Nov 22, 2016 at 9:36 AM, Yakov Pechersky notifications@github.com wrote:

How do you envision utilizing the parser? Is there a set of termination symbols the parses implicitly encodes that you can use to generate a "charset"?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maxhodak/keras-molecules/pull/41#issuecomment-262309512, or mute the thread https://github.com/notifications/unsubscribe-auth/AHtyQCl_B8mgNYbyK3GbViG46S7F2r1pks5rAygEgaJpZM4K3onv .