snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.79k stars 859 forks source link

InvalidArgumentError: indices[0] = 0 is not in [0, 0) #682

Closed TimRepke closed 7 years ago

TimRepke commented 7 years ago

I'm currently trying to get familiar with Snorkel, so I ran the tutorial notebooks. During training in the fifh, I get an error as listed below.

disc_model.train(F_train, train_marginals, n_epochs=20, lr=0.001)
# same happens with
searcher.fit(F_dev, L_gold_dev, n_epochs=50, rebalance=0.5, print_freq=25)

Before someone asks: yes, I also tried it with python2.7, same error. Unfortunately I couldn't figure out where that comes from exactly or pinpoint it to a particular commit. However, it appears to me to be caused upstream.

Let me know if you need further details or what I can do to help fix that.

[SparseLR] lr=0.001 l1=0.0 l2=0.0
[SparseLR] Building model
[SparseLR] Training model
[SparseLR] #examples=3710  #epochs=20  batch size=100
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~/workspace/snorkel/senv/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1138     try:
-> 1139       return fn(*args)
   1140     except errors.OpError as e:

~/workspace/snorkel/senv/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1120                                  feed_dict, fetch_list, target_list,
-> 1121                                  status, run_metadata)
   1122 

/usr/lib64/python3.6/contextlib.py in __exit__(self, type, value, traceback)
     88             try:
---> 89                 next(self.gen)
     90             except StopIteration:

~/workspace/snorkel/senv/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in raise_exception_on_not_ok_status()
    465           compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466           pywrap_tensorflow.TF_GetCode(status))
    467   finally:

InvalidArgumentError: indices[0] = 0 is not in [0, 0)
     [[Node: embedding_lookup_sparse/embedding_lookup = Gather[Tindices=DT_INT64, Tparams=DT_FLOAT, _class=["loc:@Variable"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Variable/read, _arg_Placeholder_2_0_2)]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-4-634e2f438b6b> in <module>()
----> 1 disc_model.train(F_train, train_marginals, n_epochs=20, lr=0.001)

~/workspace/snorkel/snorkel/learning/logistic_regression.py in train(self, X, training_marginals, n_epochs, lr, batch_size, l1_penalty, l2_penalty, print_freq, rebalance, seed)
    162             for i in range(0, n, batch_size):
    163                 r = min(n-1, i+batch_size)
--> 164                 loss, _, nnz = self._run_batch(X_train, y_train, i, r, nnz)
    165                 epoch_loss += loss
    166             # Print training stats

~/workspace/snorkel/snorkel/learning/logistic_regression.py in _run_batch(self, X_train, y_train, i, r, last_nnz)
    326             self.ids:     ids,
    327             self.weights: weights,
--> 328             self.Y:       y_batch,
    329         })
    330 

~/workspace/snorkel/senv/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    787     try:
    788       result = self._run(None, fetches, feed_dict, options_ptr,
--> 789                          run_metadata_ptr)
    790       if run_metadata:
    791         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/workspace/snorkel/senv/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
    995     if final_fetches or final_targets:
    996       results = self._do_run(handle, final_targets, final_fetches,
--> 997                              feed_dict_string, options, run_metadata)
    998     else:
    999       results = []

~/workspace/snorkel/senv/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1130     if handle is None:
   1131       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1132                            target_list, options, run_metadata)
   1133     else:
   1134       return self._do_call(_prun_fn, self._session, handle, feed_dict,

~/workspace/snorkel/senv/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1150         except KeyError:
   1151           pass
-> 1152       raise type(e)(node_def, op, message)
   1153 
   1154   def _extend_graph(self):

InvalidArgumentError: indices[0] = 0 is not in [0, 0)
     [[Node: embedding_lookup_sparse/embedding_lookup = Gather[Tindices=DT_INT64, Tparams=DT_FLOAT, _class=["loc:@Variable"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Variable/read, _arg_Placeholder_2_0_2)]]

Caused by op 'embedding_lookup_sparse/embedding_lookup', defined at:
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2698, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2808, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-634e2f438b6b>", line 1, in <module>
    disc_model.train(F_train, train_marginals, n_epochs=20, lr=0.001)
  File "/home/username/workspace/snorkel/snorkel/learning/logistic_regression.py", line 138, in train
    self._build()
  File "/home/username/workspace/snorkel/snorkel/learning/logistic_regression.py", line 256, in _build
    self._build_sigmoid(sparse_ids, sparse_vals)
  File "/home/username/workspace/snorkel/snorkel/learning/logistic_regression.py", line 214, in _build_sigmoid
    sp_weights=sparse_vals, combiner='sum')
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/tensorflow/python/ops/embedding_ops.py", line 323, in embedding_lookup_sparse
    params, ids, partition_strategy=partition_strategy, max_norm=max_norm)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/tensorflow/python/ops/embedding_ops.py", line 122, in embedding_lookup
    return maybe_normalize(_do_gather(params[0], ids, name=name))
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/tensorflow/python/ops/embedding_ops.py", line 42, in _do_gather
    return array_ops.gather(params, ids, name=name)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1179, in gather
    validate_indices=validate_indices, name=name)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/username/workspace/snorkel/senv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): indices[0] = 0 is not in [0, 0)
     [[Node: embedding_lookup_sparse/embedding_lookup = Gather[Tindices=DT_INT64, Tparams=DT_FLOAT, _class=["loc:@Variable"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Variable/read, _arg_Placeholder_2_0_2)]]

Update: I think this comes down to an issue in treedlib, since compile_relation_feature_generator doesn't produce items

ajratner commented 7 years ago

Hi @TimRepke ,

Thanks for the detailed post and for calling attention to this! I haven't seen this before, and I agree that the error is a bit lacking in any specificity...

It does however seem like an error that would occur with null data passed in; i.e. I think the error is that tensorflow is checking the sparse indices against an empty data array? Could you double check that both of your feature matrices (F_train and F_dev) are fully populated? Another minor thing would be to change print_freq to a lower value (in either or both calls) so that we can see if this is happening during training, or during dev set eval?

Also, potentially worth noting that we're about to pull in a refactor of the tensorflow bindings ( #681 ) so there's a small chance that could help here, or at least we'd love to integrate any fixes required here into that PR. So let us know re: the above sanity check questions to start

Thanks, Alex

TimRepke commented 7 years ago

Hi @ajratner , yes, in fact F_train and F_dev are empty (<228x0 sparse matrix of type '<class 'numpy.int64'>' with 0 stored elements in Compressed Sparse Row format>), I guess that wasn't clear from my previous comment. That's where I started going up and down the chain of called functions. As far as I understand it, the FeatureAnnotator is supposed to apply feature functions to the candidates

featurizer = FeatureAnnotator()
F_train = featurizer.apply(split=0)

but apparently (skipping a few calls), the anno_generator doesn't yield any items. Again, skipping a few things, I ended up in the get_binary_span_feats function, where things are handed over to treedlib, where finally nodes (candidates?) are filtered out. I'm not familiar enough with the codebase to put my finger on the issue, but that's where the data stream ends and nothing is returned, so I think it might be the problem.

TimRepke commented 7 years ago

Found the issue!!! Here min() is called on a generator and therefore iterating over it, leaving an empty iterator. I fixed that and will open a PR soon with other small fixed I came across.

ajratner commented 7 years ago

Hi @TimRepke great sleuthing! I am still confused as to why this error happens for you with Python 2.7, as it doesn't for us; but either way great catch, seems worth fixing, and thanks for the PR!!

gabcbrown commented 7 years ago

Any idea what that second data array is that TensorFlow is comparing against?

I'm getting a similar error while running _, _, _, _ = disc_model.score(session, F_test, L_gold_test) and test_predict = disc_model.predict(F_test). The difference is that it's not empty:

InvalidArgumentError: indices[30] = 28784 is not in [0, 20993)

The trace includes this TensorFlow call.

ajratner commented 7 years ago

Hi @gabcbrown,

The second data array here is a parameters matrix. Usually, this error would be indicative of the train and test sets having different feature spaces--specifically here, that the model was trained with a dataset having 20,993 features, and then the test set has additional features (extending to at least 28,784) which the model doesn't know anything about. (Ideally, a lookup like this should just return a parameter of zero... this is annoying nit w/ TF here...)

Either way, did you create F_test using apply_existing as in the tutorial? This is important and should solve this

Let us know, and either way can try to have better error message here at very least (in the short run)!

ajratner commented 7 years ago

Closing for now- hopefully v0.6 helps as well! If still an issue please re-open!