Closed TimRepke closed 7 years ago
Hi @TimRepke ,
Thanks for the detailed post and for calling attention to this! I haven't seen this before, and I agree that the error is a bit lacking in any specificity...
It does however seem like an error that would occur with null data passed in; i.e. I think the error is that tensorflow is checking the sparse indices against an empty data array? Could you double check that both of your feature matrices (F_train
and F_dev
) are fully populated? Another minor thing would be to change print_freq
to a lower value (in either or both calls) so that we can see if this is happening during training, or during dev set eval?
Also, potentially worth noting that we're about to pull in a refactor of the tensorflow bindings ( #681 ) so there's a small chance that could help here, or at least we'd love to integrate any fixes required here into that PR. So let us know re: the above sanity check questions to start
Thanks, Alex
Hi @ajratner ,
yes, in fact F_train
and F_dev
are empty (<228x0 sparse matrix of type '<class 'numpy.int64'>' with 0 stored elements in Compressed Sparse Row format>
), I guess that wasn't clear from my previous comment. That's where I started going up and down the chain of called functions.
As far as I understand it, the FeatureAnnotator is supposed to apply feature functions to the candidates
featurizer = FeatureAnnotator()
F_train = featurizer.apply(split=0)
but apparently (skipping a few calls), the anno_generator doesn't yield any items. Again, skipping a few things, I ended up in the get_binary_span_feats function, where things are handed over to treedlib, where finally nodes (candidates?) are filtered out. I'm not familiar enough with the codebase to put my finger on the issue, but that's where the data stream ends and nothing is returned, so I think it might be the problem.
Found the issue!!!
Here min()
is called on a generator and therefore iterating over it, leaving an empty iterator.
I fixed that and will open a PR soon with other small fixed I came across.
Hi @TimRepke great sleuthing! I am still confused as to why this error happens for you with Python 2.7, as it doesn't for us; but either way great catch, seems worth fixing, and thanks for the PR!!
Any idea what that second data array is that TensorFlow is comparing against?
I'm getting a similar error while running _, _, _, _ = disc_model.score(session, F_test, L_gold_test)
and test_predict = disc_model.predict(F_test)
. The difference is that it's not empty:
InvalidArgumentError: indices[30] = 28784 is not in [0, 20993)
The trace includes this TensorFlow call.
Hi @gabcbrown,
The second data array here is a parameters matrix. Usually, this error would be indicative of the train and test sets having different feature spaces--specifically here, that the model was trained with a dataset having 20,993 features, and then the test set has additional features (extending to at least 28,784) which the model doesn't know anything about. (Ideally, a lookup like this should just return a parameter of zero... this is annoying nit w/ TF here...)
Either way, did you create F_test
using apply_existing
as in the tutorial? This is important and should solve this
Let us know, and either way can try to have better error message here at very least (in the short run)!
Closing for now- hopefully v0.6 helps as well! If still an issue please re-open!
I'm currently trying to get familiar with Snorkel, so I ran the tutorial notebooks. During training in the fifh, I get an error as listed below.
Before someone asks: yes, I also tried it with python2.7, same error. Unfortunately I couldn't figure out where that comes from exactly or pinpoint it to a particular commit. However, it appears to me to be caused upstream.
Let me know if you need further details or what I can do to help fix that.
Update: I think this comes down to an issue in treedlib, since compile_relation_feature_generator doesn't produce items