ntucllab / libact

Pool-based active learning in Python
http://libact.readthedocs.org/
BSD 2-Clause "Simplified" License
777 stars 175 forks source link

QueryByCommittee 'kl_divergence' method error #174

Open tsiakmaki opened 4 years ago

tsiakmaki commented 4 years ago

Hi. thank you for your contribution. When testing the QueryByCommittee and the 'kl_divergence' method, i get the above error:

mytest.py", line 90, in run
    ask_id = qs.make_query()
  File ".../python3.7/site-packages/libact/query_strategies/query_by_committee.py", line 207, in make_query
    np.where(np.isclose(avg_kl, np.max(avg_kl)))[0])
  File "mtrand.pyx", line 902, in numpy.random.mtrand.RandomState.choice
ValueError: 'a' cannot be empty unless no samples are taken

In my case, it seems that avg_kl https://github.com/ntucllab/libact/blob/master/libact/query_strategies/query_by_committee.py#L205 contains NaN, and as such the max is nan. But also there are cases where all values are Nan, and i get the

ValueError: zero-size array to reduction operation maximum which has no identity

Just a notice, in case it makes sense to anyone else.

yangarbiter commented 4 years ago

I think the nan should come from here, if predict_proba from the classifier outputs a 0 probability, the kldivergence will have a nan value https://github.com/ntucllab/libact/blob/c3babc89ee4cc7672539de46f5a9267d16dccc1b/libact/query_strategies/query_by_committee.py#L156