ANI-2x training data set

Hi,

How does one know what was the exact data set used to train ANI-2x?

In the original ANI-2x paper, it is said that the training data set is composed of molecules from a variety of sources, including the GDB-11 database, the CheMBL database, the s66x8 benchmark, and some randomly generated amino acids and dipeptides. Nevertheless, from what I understood, these data sets are not included integrally because some specific sampling techniques are then employed.

Is it possible to know which were the exact molecules used for training?

Thank you. Best, João

isayev / ASE_ANI

ANI-2x training data set #39