Question regarding dataset generation and training

orgw commented 4 months ago

Hi guys, thanks for the code. I have a few questions.

can you elaborate on creating the dataset for training? it's hard to follow. After receiving the building blocks what should i do(which script should i run)?? How do you get modified Hartenfeller-Button reaction templates?
can you expand the reaction templates as well as starting reactants?
what are the suggested parameters for default training?
Also, do you have any plans on releasing checkpoint?

thanks...!!

orgw commented 4 months ago

I've created a text file with smiles for the corresponding building blocks, took the first 6000, and then created a mask for the data. upon training using the script i get the following error

in line 489, of synthesis_building_env.py

reaction = self.ctx.bimolecular_reactions[action[1]] *** IndexError: list index out of range

it seems that action[1] is 3309 exceeding that of number of reaction templates..

orgw commented 4 months ago

4 seems to be adding first reactant, and then 3 shouldn't appear because according to your paper, The second action level consists of the AddReactant action, which is sampled only if a ReactBi, which seems like 2, should be sampled in the first level. It's also weird that 3 does not have any value in dim=1, for instance (3, 1107), so i guess this is what causes

reaction = self.ctx.bimolecular_reactions[action[1]] *** IndexError: list index out of range

(Pdb) actions [(4, 565, None), (4, 2335, None), (4, 2488, None), (4, 3619, None), (4, 298, None), (4, 5957, None), (4, 2468, None), (4, 2214, None), (4, 1031, None), (4, 5771, None), (4, 729, None), (4, 5044, None), (4, 2557, None), (4, 3644, None), (4, 146, None), (4, 85, None), (4, 1090, None), (4, 636, None), (4, 3627, None), (4, 5248, None), (4, 3614, None), (4, 5772, None), (4, 2624, None), (4, 5281, None), (4, 561, None), (4, 4161, None), (4, 3201, None), (4, 1592, None), (4, 1383, None), (4, 4008, None), (4, 3204, None), (4, 3642, None), (4, 3756, None), (4, 5384, None), (4, 5896, None), (4, 1348, None), (4, 5026, None), (4, 3673, None), (4, 5002, None), (4, 2565, None), (4, 5709, None), (4, 4939, None), (4, 2335, None), (4, 2798, None), (4, 2940, None), (4, 1386, None), (4, 146, None), (4, 4470, None), (4, 4876, None), (4, 4074, None), (4, 761, None), (4, 4841, None), (4, 2596, None), (4, 648, None), (4, 5849, None), (4, 4689, None), (4, 4284, None), (4, 3095, None), (4, 5147, None), (4, 871, None), (4, 1257, None), (4, 2115, None), (4, 2918, None), (4, 976, None)]

/synflownet/src/gflownet/algo/reaction_sampling.py(74)sample_from_model()

-> for i, j in zip(not_done(range(n)), range(n)): (Pdb) actions [(2, 14, 3245), (0, None, None), (2, 31, 1721), (0, None, None), (2, 11, 5420), (2, 13, 1007), (2, 50, 4364), (2, 30, 668), (2, 11, 1793), (2, 14, 3382), (2, 26, 3098), (0, None, None), (3, 1107), (2, 47, 2565), (2, 12, 3570), (2, 9, 2706), (2, 10, 2649), (2, 9, 2538), (0, None, None), (2, 28, 52), (3, 361), (0, None, None), (2, 19, 1826), (3, 244), (2, 50, 3332), (2, 28, 1183), (2, 14, 4945), (0, None, None), (2, 29, 2053), (2, 28, 1256), (0, None, None), (0, None, None), (2, 6, 553), (0, None, None), (1, 11, None), (2, 8, 5687), (2, 28, 1348), (2, 6, 432), (3, 1636), (2, 14, 2550), (2, 6, 119), (2, 47, 370), (2, 29, 1234), (3, 648), (2, 47, 2858), (0, None, None), (2, 9, 2443), (2, 6, 909), (2, 26, 2541), (0, None, None), (2, 12, 4573), (2, 6, 138), (0, None, None), (2, 26, 2658), (2, 13, 3193), (2, 6, 353), (3, 4397), (2, 14, 2602), (0, None, None), (2, 10, 4958), (2, 56, 5420), (0, None, None), (1, 11, None), (2, 8, 1985)]

545487677 commented 4 months ago

Hi Sir, I am currently unable to download the dataset of the building blocks as it has not been sent to me. Could you kindly share the data with me? @orgw

mirunacrt commented 1 month ago

Hi, we have updated the code to process and index actions differently. We recommend you try again and make sure that the you generate a precomputed building blocks masks file compatible with the templates and building blocks you are using. The current implementation should not run into such errors. Please let us know if you have any other questions!

mirunacrt / synflownet

Question regarding dataset generation and training #1