On the data set with ChEMBL 25, with the current (53ec926b2e320ff1afa85390611f649c5b215213) data cleaning/grouping. With 10 trees on the sklearn random forest we get 77% on top-1 precision, recall, and F1 score, and 89-92% top-5 accuracy.
In 833ece7aa220818c332c6c7c905b9941a462a70d I added a Jupyter notebook that implements a very basic neural network classifier model. It gets 79% on top-1 accuracy, and 97% top-5 accuracy.
On the data set with ChEMBL 25, with the current (53ec926b2e320ff1afa85390611f649c5b215213) data cleaning/grouping. With 10 trees on the sklearn random forest we get 77% on top-1 precision, recall, and F1 score, and 89-92% top-5 accuracy.