hotpotqa / hotpot

Apache License 2.0
445 stars 75 forks source link

Processing Error: list index out of range #21

Closed zjiehang closed 5 years ago

zjiehang commented 5 years ago

I get an out-of-range error while processing the training dataset. It happens at around 24263 questions. Thanks!

Traceback (most recent call last):
  File "/opt/miniconda3/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker
    r = call_item()
  File "/opt/miniconda3/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 272, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/opt/miniconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 567, in __call__
    return self.func(*args, **kwargs)
  File "/opt/miniconda3/lib/python3.7/site-packages/joblib/parallel.py", line 225, in __call__
    for func, args, kwargs in self.items]
  File "/opt/miniconda3/lib/python3.7/site-packages/joblib/parallel.py", line 225, in <listcomp>
    for func, args, kwargs in self.items]
  File "/home/develop/ailab/zjiehang/HotpotQA/prepro.py", line 171, in _process_article
    best_indices = (answer_span[0], answer_span[-1])
IndexError: list index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/develop/ailab/zjiehang/HotpotQA/main.py", line 86, in <module>
    prepro(config)
  File "/home/develop/ailab/zjiehang/HotpotQA/prepro.py", line 332, in prepro
    examples, eval_examples = process_file(config.data_file, config, word_counter, char_counter)
  File "/home/develop/ailab/zjiehang/HotpotQA/prepro.py", line 191, in process_file
    outputs = Parallel(n_jobs=12, verbose=10)(delayed(_process_article)(article, config) for article in data)
  File "/opt/miniconda3/lib/python3.7/site-packages/joblib/parallel.py", line 934, in __call__
    self.retrieve()
  File "/opt/miniconda3/lib/python3.7/site-packages/joblib/parallel.py", line 833, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/opt/miniconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 521, in wrap_future_result
    return future.result(timeout=timeout)
  File "/opt/miniconda3/lib/python3.7/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/opt/miniconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
IndexError: list index out of range_**
zhijuny commented 5 years ago

I met the same problem, have you solved it?

zjiehang commented 5 years ago

I met the same problem, have you solved it? @zhijuny Yeah, it happens at question 24494, the answer "menid Empi" may be incomplete, I rewrite it to "Achaemenid Empire" and the bug solved. I have checked it in Wikipedia, the complete answer is right. I found this problem only happened in Ubuntu (or maybe Linux? ) system, while in Windows system, no errors happened, further, there existed some special characters in question 24494 (you can check the context of question 24494), so maybe the character encoding methods in different systems cause this problem. Hope it helps!

zhijuny commented 5 years ago

I met the same problem, have you solved it? @zhijuny Yeah, it happens at question 24494, the answer "menid Empi" may be incomplete, I rewrite it to "Achaemenid Empire" and the bug solved. I have checked it in Wikipedia, the complete answer is right. I found this problem only happened in Ubuntu (or maybe Linux? ) system, while in Windows system, no errors happened, further, there existed some special characters in question 24494 (you can check the context of question 24494), so maybe the character encoding methods in different systems cause this problem. Hope it helps!

Thank you. I'll have a try

xptree commented 5 years ago

I also met this problem. Running

sed -ie  "s/\"menid Empi\"/\"Achaemenid Empire\"/g" hotpot_train_v1.1.json

before

python main.py --mode prepro --data_file hotpot_train_v1.1.json --para_limit 2250 --data_split train

solves it.

akshay0411 commented 5 years ago

I got error when preprocessing the data ........................................................................... /home/akshay/akshayproj/prepro.py in _process(sent=u'Adam Collis', is_sup_fact=False, is_title=True) 93 flat_offsets = [] 94 start_end_facts = [] # (start_token_id, end_token_id, is_sup_fact=True/False) 95 sent2title_ids = [] 96 97 def _process(sent, is_sup_fact, is_title=False): ---> 98 nonlocal (text_context, context_tokens, context_chars, offsets, start_end_facts, flat_offsets) text_context = undefined 99 N_chars = len(text_context) 100 101 sent = sent 102 sent_tokens = word_tokenize(sent)

NameError: global name 'nonlocal' is not defined

zjiehang commented 5 years ago

I got error when preprocessing the data ........................................................................... /home/akshay/akshayproj/prepro.py in _process(sent=u'Adam Collis', is_sup_fact=False, is_title=True) 93 flat_offsets = [] 94 start_end_facts = [] # (start_token_id, end_token_id, is_sup_fact=True/False) 95 sent2title_ids = [] 96 97 def _process(sent, is_sup_fact, is_title=False): ---> 98 nonlocal (text_context, context_tokens, context_chars, offsets, start_end_facts, flat_offsets) text_context = undefined 99 N_chars = len(text_context) 100 101 sent = sent 102 sent_tokens = word_tokenize(sent)

NameError: global name 'nonlocal' is not defined

Are you using python 2.x? The keyword 'nonlocal' is only used in python3.x, please check it.