get_ranked_topk cannot be reproduced

The _get_rankedtopk.py usage example can be found in the interaction folder. It is utilized to retrieve the top-k predictions for a given molecular SMILES input.

During the inference process of a molecule, each augmented SMILES is treated independently, resulting in each generating a beam (e.g., 20) predictions. Consequently, there are a total of beam augs (e.g., 20 20) candidates. The purpose of get_ranked_topk.py is to calculate the score (an alternate of the probability) of each candidate and arrange them in a ranking based on this score.

A key observation is that there might be duplicate predictions across different augmented SMILES. The scoring function computes the score considering the ranking and frequency of candidates. For a more in-depth analysis, please refer to the Augmented Transformer [1].

[1] Tetko, Igor V., et al. "State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis." Nature Communications 11.1 (2020): 5575.

yuqianghan / editretro

get_ranked_topk cannot be reproduced #6