uiocompcat / tmcinvdes

Public repository for preprint "Deep Generative Model for the Dual-Objective Inverse Design of Metal Complexes"
MIT License
2 stars 0 forks source link

Bare minimum repo #3

Closed Strandgaard96 closed 4 months ago

Strandgaard96 commented 4 months ago

I added the relevant data files used to create all the plots in the preprint main text. Also added the functionality to create the training sets and a link to the JT-VAE forked repo. Added conditional optimization results for the monodentate generator. All this should be available to people now and then we can always build on this. Also simplified README.md

tlinjordet commented 4 months ago

Thanks for your efforts @Strandgaard96!

Note: I was not able to run the code as-is with the present instructions. I have investigated how to make the code run and have included solutions (and recommendations where appropriate) in the list of fixes below.

Please make and push the following fixes on the current feature branch+PR, then I will test and review again:

def load_ligand_xyz(ligand_xyzs_path):
    # load ligand xyz dict
    xyzs_dict = {}
    with open(ligands_xyzs_path, "r") as f_in:
        for xyz in f_in.read().split("\n\n"):
            key = xyz.split("\n")[1]
            xyzs_dict[key]=xyz
    return xyzs_dict

For now we can defer making the planned Bash approach work with the Python scripts, but these may later affect some of the import statements to let us run Python scripts tmcinvdes/subdir/script.py by the pattern python -m tmcinvdes.subdir.script -i input_file ....

Strandgaard96 commented 4 months ago

@tlinjordet Appreciate the detailed review!

I cleaned the paths a bit. I still think that hardcoded paths are fine if the user knows what to fill in instead. We just need to get the code out there ASAP.

Fixed most of these things and select_initial_ligands.py should be runnable as the ligand_dict is created on the fly.

I think the naming of the folders is more clear in this way. Keeping the README.md as clean as possible is important. With these names the user can immediately identify where the training sets, models, and generated ligands are, which is better IMO.

The column labels should be mostly self-explanatory. Plot readiness is something we can implement in the next steps and should be low priority. As long as it is clear what a column is then we can always build on that.

I dont see anywhere where i write to .txt files?

Will check a new user install tomorrow morning and then we should merge.

Strandgaard96 commented 4 months ago

Can confirm that installing this version from scratch works if path tmqmg-l is set correct.