hachmannlab / chemml

ChemML is a machine learning and informatics program suite for the chemical and materials sciences.
https://hachmannlab.github.io/chemml
BSD 3-Clause "New" or "Revised" License
162 stars 31 forks source link

select the test set for the Active Learning manually #13

Open GuptaVishu2002 opened 3 years ago

GuptaVishu2002 commented 3 years ago

At the moment the EMC active learning package initializes the training and test set randomly using tr_ind, te_ind = al.initialize(). Is it possible to initialize the test set manually?

aditya1707 commented 3 years ago

The initialize function returns two n-dimensional numpy arrays containing the training and test indices which are randomly assigned. If you want to do this manually, then skip this function call and define your own lists/ndarrays with indices and assign them to tr_ind, te_ind and pass them to the deposit function as referred to on the active learning documentation page.

GuptaVishu2002 commented 3 years ago

If I understand correctly, instead of using tr_ind, te_ind = al.initialize(), I simply have to do something like

tr_ind = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
te_ind = np.array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

and then perform the following?

al.deposit(tr_ind, y[tr_ind])
al.deposit(te_ind, y[te_ind])

I tried the follwing but it is returning False. Is there anything that I am missing here?