Previously, after incorporating the top_n flag in the write_structural_prior function, the tc file contained a test molecule and its respective generated molecules ranked by top_n. In the resulting CSV file, detailed information about the test molecules (such as ‘smiles’, ‘mass’, ‘formula’, ‘mass_known’, ‘formula_known’) was only included for the first-ranked molecule. The subsequent rows had empty slots for this information. This PR updates the tc file to include complete information about the test molecules for all rows.
The top_k tc graph is designed to show the top-k accuracy by considering each min_tc threshold (0.4, 0.675, 1) as 'accurate'. To achieve this, an exact match for each test molecule is needed for each min_tc threshold. For molecules that do not have a single match, 'NaN' is assigned to all columns except for the min_tc threshold, making it easy to separate these cases.
Exact changes in this PR:
Previously, after incorporating the
top_n
flag in thewrite_structural_prior
function, thetc
file contained a test molecule and its respective generated molecules ranked bytop_n
. In the resulting CSV file, detailed information about the test molecules (such as ‘smiles’, ‘mass’, ‘formula’, ‘mass_known’, ‘formula_known’) was only included for the first-ranked molecule. The subsequent rows had empty slots for this information. This PR updates thetc
file to include complete information about the test molecules for all rows.The top_k tc graph is designed to show the top-k accuracy by considering each min_tc threshold (0.4, 0.675, 1) as 'accurate'. To achieve this, an exact match for each test molecule is needed for each min_tc threshold. For molecules that do not have a single match, 'NaN' is assigned to all columns except for the min_tc threshold, making it easy to separate these cases.