mirunacrt / synflownet

A GFlowNet with a chemical synthesis action space.
MIT License
17 stars 3 forks source link

Question regarding dataset generation and training #1

Open orgw opened 3 weeks ago

orgw commented 3 weeks ago

Hi guys, thanks for the code. I have a few questions.

  1. can you elaborate on creating the dataset for training? it's hard to follow. After receiving the building blocks what should i do(which script should i run)?? How do you get modified Hartenfeller-Button reaction templates?

  2. can you expand the reaction templates as well as starting reactants?

  3. what are the suggested parameters for default training?

  4. Also, do you have any plans on releasing checkpoint?

thanks...!!

orgw commented 2 weeks ago

I've created a text file with smiles for the corresponding building blocks, took the first 6000, and then created a mask for the data. upon training using the script i get the following error

in line 489, of synthesis_building_env.py

reaction = self.ctx.bimolecular_reactions[action[1]] *** IndexError: list index out of range

it seems that action[1] is 3309 exceeding that of number of reaction templates..

orgw commented 2 weeks ago

4 seems to be adding first reactant, and then 3 shouldn't appear because according to your paper, The second action level consists of the AddReactant action, which is sampled only if a ReactBi, which seems like 2, should be sampled in the first level. It's also weird that 3 does not have any value in dim=1, for instance (3, 1107), so i guess this is what causes

reaction = self.ctx.bimolecular_reactions[action[1]] *** IndexError: list index out of range

(Pdb) actions [(4, 565, None), (4, 2335, None), (4, 2488, None), (4, 3619, None), (4, 298, None), (4, 5957, None), (4, 2468, None), (4, 2214, None), (4, 1031, None), (4, 5771, None), (4, 729, None), (4, 5044, None), (4, 2557, None), (4, 3644, None), (4, 146, None), (4, 85, None), (4, 1090, None), (4, 636, None), (4, 3627, None), (4, 5248, None), (4, 3614, None), (4, 5772, None), (4, 2624, None), (4, 5281, None), (4, 561, None), (4, 4161, None), (4, 3201, None), (4, 1592, None), (4, 1383, None), (4, 4008, None), (4, 3204, None), (4, 3642, None), (4, 3756, None), (4, 5384, None), (4, 5896, None), (4, 1348, None), (4, 5026, None), (4, 3673, None), (4, 5002, None), (4, 2565, None), (4, 5709, None), (4, 4939, None), (4, 2335, None), (4, 2798, None), (4, 2940, None), (4, 1386, None), (4, 146, None), (4, 4470, None), (4, 4876, None), (4, 4074, None), (4, 761, None), (4, 4841, None), (4, 2596, None), (4, 648, None), (4, 5849, None), (4, 4689, None), (4, 4284, None), (4, 3095, None), (4, 5147, None), (4, 871, None), (4, 1257, None), (4, 2115, None), (4, 2918, None), (4, 976, None)]

/synflownet/src/gflownet/algo/reaction_sampling.py(74)sample_from_model()

-> for i, j in zip(not_done(range(n)), range(n)): (Pdb) actions [(2, 14, 3245), (0, None, None), (2, 31, 1721), (0, None, None), (2, 11, 5420), (2, 13, 1007), (2, 50, 4364), (2, 30, 668), (2, 11, 1793), (2, 14, 3382), (2, 26, 3098), (0, None, None), (3, 1107), (2, 47, 2565), (2, 12, 3570), (2, 9, 2706), (2, 10, 2649), (2, 9, 2538), (0, None, None), (2, 28, 52), (3, 361), (0, None, None), (2, 19, 1826), (3, 244), (2, 50, 3332), (2, 28, 1183), (2, 14, 4945), (0, None, None), (2, 29, 2053), (2, 28, 1256), (0, None, None), (0, None, None), (2, 6, 553), (0, None, None), (1, 11, None), (2, 8, 5687), (2, 28, 1348), (2, 6, 432), (3, 1636), (2, 14, 2550), (2, 6, 119), (2, 47, 370), (2, 29, 1234), (3, 648), (2, 47, 2858), (0, None, None), (2, 9, 2443), (2, 6, 909), (2, 26, 2541), (0, None, None), (2, 12, 4573), (2, 6, 138), (0, None, None), (2, 26, 2658), (2, 13, 3193), (2, 6, 353), (3, 4397), (2, 14, 2602), (0, None, None), (2, 10, 4958), (2, 56, 5420), (0, None, None), (1, 11, None), (2, 8, 1985)]

545487677 commented 1 week ago

Hi Sir, I am currently unable to download the dataset of the building blocks as it has not been sent to me. Could you kindly share the data with me? @orgw