nateraw / Lda2vec-Tensorflow

Tensorflow 1.5 implementation of Chris Moody's Lda2vec, adapted from @meereeum
MIT License
107 stars 40 forks source link

File "ops.pyx", line 111, in thinc.neural.ops.Ops.flatten IndexError: list index out of range #6

Closed dbl001 closed 6 years ago

dbl001 commented 6 years ago

I'm trying to run 'run_20newsgroups.py' from the latest update. I'm getting this error: File "ops.pyx", line 111, in thinc.neural.ops.Ops.flatten IndexError: list index out of range

I am running Anaconda 3.6 in an Anaconda environment: 'spacy' . I changed the call to NlpPipeline as folllows:

SP = NlpPipeline(path_to_file, 50, merge=True, num_threads=8, context=False, vectors="google_news_model")
    # SP = NlpPipeline(path_to_file, 50, merge=True, num_threads=8, context=True, usecols=["texts"], vectors="google_news_model")
(spacy) David-Laxers-MacBook-Pro:Lda2vec-Tensorflow davidlaxer$ python run_20newsgroups.py 
/Users/davidlaxer/anaconda/envs/spacy/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
/Users/davidlaxer/anaconda/envs/spacy/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
made texts
about to enter that pipe
Traceback (most recent call last):
  File "run_20newsgroups.py", line 18, in <module>
    SP = NlpPipeline(path_to_file, 50, merge=True, num_threads=8, context=False, vectors="google_news_model")
  File "/Users/davidlaxer/Lda2vec-Tensorflow/lda2vec/nlppipe.py", line 101, in __init__
    self.tokenize()
  File "/Users/davidlaxer/Lda2vec-Tensorflow/lda2vec/nlppipe.py", line 176, in tokenize
    for row, doc in enumerate(self.nlp.pipe(self.texts, n_threads=self.num_threads, batch_size=1000)):
  File "/Users/davidlaxer/anaconda/envs/spacy/lib/python3.6/site-packages/spacy/language.py", line 554, in pipe
    for doc in docs:
  File "nn_parser.pyx", line 369, in pipe
  File "cytoolz/itertoolz.pyx", line 1046, in cytoolz.itertoolz.partition_all.__next__ (cytoolz/itertoolz.c:14538)
  File "nn_parser.pyx", line 376, in pipe
  File "nn_parser.pyx", line 403, in spacy.syntax.nn_parser.Parser.parse_batch
  File "nn_parser.pyx", line 724, in spacy.syntax.nn_parser.Parser.get_batch_model
  File "/Users/davidlaxer/anaconda/envs/spacy/lib/python3.6/site-packages/thinc/api.py", line 61, in begin_update
    X, inc_layer_grad = layer.begin_update(X, drop=drop)
  File "/Users/davidlaxer/anaconda/envs/spacy/lib/python3.6/site-packages/thinc/api.py", line 292, in begin_update
    X, bp_layer = layer.begin_update(layer.ops.flatten(seqs_in, pad=pad),
  File "ops.pyx", line 111, in thinc.neural.ops.Ops.flatten
IndexError: list index out of range
(spacy) David-Laxers-MacBook-Pro:Lda2vec-Tensorflow davidlaxer$ 

With the original code:

(spacy) David-Laxers-MacBook-Pro:Lda2vec-Tensorflow davidlaxer$ python run_20newsgroups.py 
/Users/davidlaxer/anaconda/envs/spacy/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
/Users/davidlaxer/anaconda/envs/spacy/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Traceback (most recent call last):
  File "/Users/davidlaxer/anaconda/envs/spacy/lib/python3.6/site-packages/pandas/indexes/base.py", line 2134, in get_loc
    return self._engine.get_loc(key)
  File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
  File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'texts'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_20newsgroups.py", line 19, in <module>
    SP = NlpPipeline(path_to_file, 50, merge=True, num_threads=8, context=True, usecols=["texts"], vectors="google_news_model")
  File "/Users/davidlaxer/Lda2vec-Tensorflow/lda2vec/nlppipe.py", line 101, in __init__
    self.tokenize()
  File "/Users/davidlaxer/Lda2vec-Tensorflow/lda2vec/nlppipe.py", line 149, in tokenize
    self.texts = df[text_col_name].values.astype(str).tolist()
  File "/Users/davidlaxer/anaconda/envs/spacy/lib/python3.6/site-packages/pandas/core/frame.py", line 2059, in __getitem__
    return self._getitem_column(key)
  File "/Users/davidlaxer/anaconda/envs/spacy/lib/python3.6/site-packages/pandas/core/frame.py", line 2066, in _getitem_column
    return self._get_item_cache(key)
  File "/Users/davidlaxer/anaconda/envs/spacy/lib/python3.6/site-packages/pandas/core/generic.py", line 1386, in _get_item_cache
    values = self._data.get(item)
  File "/Users/davidlaxer/anaconda/envs/spacy/lib/python3.6/site-packages/pandas/core/internals.py", line 3543, in get
    loc = self.items.get_loc(item)
  File "/Users/davidlaxer/anaconda/envs/spacy/lib/python3.6/site-packages/pandas/indexes/base.py", line 2136, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
  File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'texts'
(spacy) David-Laxers-MacBook-Pro:Lda2vec-Tensorflow davidlaxer$ 
rikkuporta commented 6 years ago

How did you solve ot, since you closed the issue?

nateraw commented 6 years ago

@rikkuporta did you figure it out? Is this still breaking?

Check out this post, it is an off by 1 index error. [https://github.com/nateraw/Lda2vec-Tensorflow/issues/5]()

Kitwradr commented 5 years ago

How was this issue solved? @dbl001 @e2dubba @nateraw

dbl001 commented 5 years ago

The code in 'nlppipe.py' no longer uses 'NlpPipeline()'. Are you still getting this error?

Kitwradr commented 5 years ago

Yes while training a model using SpaCy

dbl001 commented 5 years ago

Can you provide detailed information on what’s causing this issue?

On Jul 11, 2019, at 12:51 AM, Suhas notifications@github.com wrote:

Yes while training a model using SpaCy

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nateraw/Lda2vec-Tensorflow/issues/6?email_source=notifications&email_token=AAXWFW3MX5MITCL6H4ILY73P63Q7RA5CNFSM4FHZ2TA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZV2XDI#issuecomment-510372749, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXWFW5HOAHVXXSUSO22PQ3P63Q7RANCNFSM4FHZ2TAQ.