snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

error_analysis method not working for the discriminative model #833

Closed MatthieudeR closed 6 years ago

MatthieudeR commented 6 years ago

It seems to me the error_analysis method of the Classifier class is not working for the rnn implementation. If I try to run disc_model.error_analysis(sess, L_dev, L_gold_dev), where the arguments are sparse matrices, I get errors (due to a bad implementation of len in sparse matrices ?)

The problems seem to mainly come from the _make_tensor method

EDIT: It seems that in some cases (i.e. on some experiments), at the prediction step, the embedding_lookup call in snorkel/snorkel/learning/disc_models/rnn/rnn_base.py, line 93 fails with

indices[0,2] = -1 is not in [0,5005)

As the training runs properly, the embedding must be properly constructed. However, I do not find any reason why the lookup should yield a -1 value and cannot find any documentation on this.

ajratner commented 6 years ago

Hi @MatthieudeR ,

Very sorry for taking so long to get to this- let it slip!

The issue here is that you are passing label matrices (L_dev and L_gold_dev, which are sparse matrices with elements in {-1,0,1} representing the votes of the LFs) into the end discriminative model, rather than passing in the Candidates. Please take a look at how it's done in the intro tutorial again and let me know if that clears things up!

I will also add a better error catch for this now

Thanks, Alex