Evaluating invalid and known smiles generated by the model

skinniderlab / CLM

MIT License

0 stars 0 forks source link

Evaluating invalid and known smiles generated by the model #173

Closed anushka255 closed 1 month ago

vineetbansal commented 1 month ago

Points we discussed:

possibly group by inchikey for known_smiles instead of smiles column?
we may wish to specify dtypes explicitly instead of relying on df = pd.DataFrame(data, columns=columns).convert_dtypes() (have a tuple as the value in molecular_properties dict, for example).
Line 214 in calculate_outcomes: "% unique": len(bin_df) / len(bin_df), should be "% unique": len(bin_df) / len(bin_df["size"].sum())

vineetbansal commented 1 month ago

diff for calculate_outcomes.csv is now as we would expect. The last bullet point above is probably worth fixing before we merge - the remaining 2 can be made into issues.