Closed Strandgaard96 closed 4 months ago
Thanks for your efforts @Strandgaard96!
Note: I was not able to run the code as-is with the present instructions. I have investigated how to make the code run and have included solutions (and recommendations where appropriate) in the list of fixes below.
Please make and push the following fixes on the current feature branch+PR, then I will test and review again:
datasets
back to the planned naming system, instead add bullet points with a brief explanation+pointer about the files for (respectively) training sets, trained models, and generated ligands as needed in datasets/README.md
. Also, remember to link to datasets/DETAILS.md
(please rename detailed_info.md
) from datasets/README.md
.environment.yml
here matches the common environment from the private development repo. Update the latter environment.yml
if needed by loading that environment, installing extra packages, and then exporting to file with package versions (but not builds) to the updated environment.yml
both here and there. Let us keep the environment name tmcinvdes
.
conda install -c conda-forge xyz2mol
), perhaps it would be sufficient to add xyz2mol
to the common environment.yml
? If not, see how the uxtbpy
package is added by the common environment.yml
by pointing at the uxtbpy
GitHub repository within the environment.yml
pip clause. This could even be extended to point to a specific commit of the repository to be pip install
'ed. Relying on conda would be preferable, however.pre-commit run --all-files
until all hooks pass with our common configuration before pushing changes. tmcinvdes/ligand_generation
:
select_initial_ligands.py
, utils.py
, and xyz2mol_functionality.py
have hard-coded paths or have placeholders for hard-coded paths that require user edits. utils.py
hard-coded paths include one pointing to a local clone of the private repo xyz2mol_tm
, specifically the file ligands_dict_xyz.json
. Also, paths to both tmQMg-L
and tmqmg-l
, where the latter might be a modified version of former. Unclear what differences are expected here, but all necessary file paths can be set as function arguments and passed from argparse.input_dir
-derived paths when the utility functions are called by running select_initial_ligands.py
as __main__
. select_initial_ligands.py
and its local dependencies do not run as described. Possibly something like the following version of the function load_ligand_xyz
in ligand_generation/utils.py
would work: def load_ligand_xyz(ligand_xyzs_path):
# load ligand xyz dict
xyzs_dict = {}
with open(ligands_xyzs_path, "r") as f_in:
for xyz in f_in.read().split("\n\n"):
key = xyz.split("\n")[1]
xyzs_dict[key]=xyz
return xyzs_dict
ligand_xyzs_path
could be derived from argparse.input_dir
, assuming the necessary file could be read from the local clone of the tmQMg-L
repository. .xyz
files in the tmQMg-L
repository or easily derivable from those, then ligand_xyzs_path
should be given via another argparse
flag in select_initial_ligands.py
..txt
files) dataframe to file with df.to_csv(header=True, index=False)
as described at the bottom in datasets/DETAILS.md
. tmcinvdes/quantum_chemistry/orca_parsers.py
please change line 13 from .cm5 import get_cm5
into from cm5 import get_cm5
. This makes orca_parsers.py
run from its subdirectory as described under Usage in the docstring. Please add such a usage example to tmcinvdes/quantum_chemistry/README.md
in the same way as tmcinvdes/ligand_generation/README.md
. For now we can defer making the planned Bash approach work with the Python scripts, but these may later affect some of the import statements to let us run Python scripts tmcinvdes/subdir/script.py
by the pattern python -m tmcinvdes.subdir.script -i input_file ...
.
@tlinjordet Appreciate the detailed review!
I cleaned the paths a bit. I still think that hardcoded paths are fine if the user knows what to fill in instead. We just need to get the code out there ASAP.
Fixed most of these things and select_initial_ligands.py should be runnable as the ligand_dict is created on the fly.
I think the naming of the folders is more clear in this way. Keeping the README.md as clean as possible is important. With these names the user can immediately identify where the training sets, models, and generated ligands are, which is better IMO.
The column labels should be mostly self-explanatory. Plot readiness is something we can implement in the next steps and should be low priority. As long as it is clear what a column is then we can always build on that.
I dont see anywhere where i write to .txt files?
Will check a new user install tomorrow morning and then we should merge.
Can confirm that installing this version from scratch works if path tmqmg-l is set correct.
I added the relevant data files used to create all the plots in the preprint main text. Also added the functionality to create the training sets and a link to the JT-VAE forked repo. Added conditional optimization results for the monodentate generator. All this should be available to people now and then we can always build on this. Also simplified README.md