omidrohanian / irony_detection

Code and data used for participation in SemEval-2018 Task 3: "Irony detection in English tweets"
16 stars 10 forks source link

IndexError: invalid index to scalar variable #2

Closed amosbastian closed 4 years ago

amosbastian commented 5 years ago

When running the feature_generator_TaskA notebook, specifically cell 9, I get the following error:

image

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-9-0b9719d949b3> in <module>
      5 for tweet in corpus_preprocessed:
      6     chunks = chunkIt(tweet, 2)
----> 7     polarity_vectors.append(np.concatenate(((polarity(chunks[0])[1], polarity(chunks[1])[1])), axis=0))
      8 
      9 assert len(ekphrasis_feats) == len(polarity_vectors)

~/Documents/scriptie/irony_detection/venv/lib/python3.6/site-packages/ekphrasis/utils/nlp.py in polarity(doc, neg_comma, neg_modals)
    213 
    214     _scores = numpy.mean(numpy.array(scores), axis=0)
--> 215     _polarity = _scores[0] - _scores[1]
    216 
    217     return _polarity, _scores

IndexError: invalid index to scalar variable.

All the preceding cells seem to run fine, so I don't know what could be causing this. Any ideas?

ReinierKoops commented 4 years ago

I got the same issue @amosbastian

amosbastian commented 4 years ago

@ReinierKoops I didn't bother solving it, so good luck!

ReinierKoops commented 4 years ago

@amosbastian @omidrohanian @shivaat I fixed it; make sure when you install nltk to also install:

shivaat commented 4 years ago

Thanks, @ReinierKoops We should add these to the dependencies.

ReinierKoops commented 4 years ago

I've made a pull request to include all the dependencies to be able to run this project from clean slate; please see https://github.com/omidrohanian/irony_detection/pull/3 @shivaat @omidrohanian

omidrohanian commented 4 years ago

@ReinierKoops Thanks for your proposed changes. This error is related to an outside library (ekphrasis) which we used to generate the polarity scores from, and we didn't get that error at the time, so it's definitely a version issue. We used ekphrasis here since it computes the scores by looking up values in sentiwordnet but it is not a key part of the program. It makes sense to include wordnet and sentiwordnet in NLTK downloads. However, I don't remember which component was dependent on averaged_perceptron_tagger or whether that is used at all. Are you sure the last bit is a dependency as well?

ReinierKoops commented 4 years ago

Without the perceptron tagger you will get the following error:

---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
<ipython-input-8-0b9719d949b3> in <module>
      5 for tweet in corpus_preprocessed:
      6     chunks = chunkIt(tweet, 2)
----> 7     polarity_vectors.append(np.concatenate(((polarity(chunks[0])[1], polarity(chunks[1])[1])), axis=0))
      8 
      9 assert len(ekphrasis_feats) == len(polarity_vectors)

/usr/local/lib/python3.7/site-packages/ekphrasis/utils/nlp.py in polarity(doc, neg_comma, neg_modals)
    188 
    189     tagged = nltk.pos_tag([wordnet_lemmatizer.lemmatize(w)
--> 190                            for w in doc])
    191     negations = find_negations(doc, neg_comma=neg_comma, neg_modals=neg_modals)
    192     scores = []

/usr/local/lib/python3.7/site-packages/nltk/tag/__init__.py in pos_tag(tokens, tagset, lang)
    159     :rtype: list(tuple(str, str))
    160     """
--> 161     tagger = _get_tagger(lang)
    162     return _pos_tag(tokens, tagset, tagger, lang)
    163 

/usr/local/lib/python3.7/site-packages/nltk/tag/__init__.py in _get_tagger(lang)
    105         tagger.load(ap_russian_model_loc)
    106     else:
--> 107         tagger = PerceptronTagger()
    108     return tagger
    109 

/usr/local/lib/python3.7/site-packages/nltk/tag/perceptron.py in __init__(self, load)
    160         if load:
    161             AP_MODEL_LOC = 'file:' + str(
--> 162                 find('taggers/averaged_perceptron_tagger/' + PICKLE)
    163             )
    164             self.load(AP_MODEL_LOC)

/usr/local/lib/python3.7/site-packages/nltk/data.py in find(resource_name, paths)
    699     sep = '*' * 70
    700     resource_not_found = '\n%s\n%s\n%s\n' % (sep, msg, sep)
--> 701     raise LookupError(resource_not_found)
    702 
    703 

LookupError: 
**********************************************************************
  Resource averaged_perceptron_tagger not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('averaged_perceptron_tagger')

  For more information see: https://www.nltk.org/data.html

  Attempted to load taggers/averaged_perceptron_tagger/averaged_perceptron_tagger.pickle

Therefore the averaged_perceptron_tagger is needed.