The model is currently working OK for me but I am just curious to know how the double letter atoms (like Cl and Br) are handled in encoding/decoding. I have looked at the one_hot_encoder module. It seems they are treated as 2 tokens (e.g "C" and "l" for chlorine atom). Please correct me if I am wrong because I could not see they are being handled as I thought they should, i.e. replacing these double-letter atoms with a dummy character before doing the one-hot encoding.
If chlorine is indeed treated as two tokens, wouldn't it confuse the network as it conflicts with the aliphatic carbon C?
Hi,
The model is currently working OK for me but I am just curious to know how the double letter atoms (like Cl and Br) are handled in encoding/decoding. I have looked at the one_hot_encoder module. It seems they are treated as 2 tokens (e.g "C" and "l" for chlorine atom). Please correct me if I am wrong because I could not see they are being handled as I thought they should, i.e. replacing these double-letter atoms with a dummy character before doing the one-hot encoding. If chlorine is indeed treated as two tokens, wouldn't it confuse the network as it conflicts with the aliphatic carbon C?
Albert