huminghao16 / RE3QA

Retrieve, Read, Rerank: Towards End-to-End Multi-Document Reading Comprehension
Apache License 2.0
105 stars 23 forks source link

Errror with multiprocessing running build_span_corpus.py #6

Open Seohyeong opened 5 years ago

Seohyeong commented 5 years ago

Running python -m triviaqa.build_sppan_corpus --corpus wiki --n_processes 7 line of codes throws me ValueError. I'm posting the entire output just in case it helps.

Loading verified questions
Adding answers for verified question
Completed question 0 of 38 (0.000)
Completed question 0 of 37 (0.000)
Completed question 0 of 37 (0.000)
Completed question 0 of 37 (0.000)
Completed question 0 of 37 (0.000)
Completed question 0 of 37 (0.000)
Completed question 0 of 37 (0.000)
Completed question 0 of 37 (0.000)
multiprocessing.pool.RemoteTraceback: 

Traceback (most recent call last):
  File "/home/seohyeong/anaconda3/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/seohyeong/anaconda3/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/home/seohyeong/Dropbox/Naver/RE3QA-master/triviaqa/answer_detection.py", line 86, in _compute_answer_spans_chunk
    compute_answer_spans(questions, corpus, tokenizer, detector)
  File "/home/seohyeong/Dropbox/Naver/RE3QA-master/triviaqa/answer_detection.py", line 71, in compute_answer_spans
    raise ValueError()
ValueError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/seohyeong/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/seohyeong/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/seohyeong/Dropbox/Naver/RE3QA-master/triviaqa/build_span_corpus.py", line 210, in <module>
    main()
  File "/home/seohyeong/Dropbox/Naver/RE3QA-master/triviaqa/build_span_corpus.py", line 196, in main
    build_wiki_corpus(args.n_processes)
  File "/home/seohyeong/Dropbox/Naver/RE3QA-master/triviaqa/build_span_corpus.py", line 137, in build_wiki_corpus
    FastNormalizedAnswerDetector(), n_processes)
  File "/home/seohyeong/Dropbox/Naver/RE3QA-master/triviaqa/build_span_corpus.py", line 53, in build_dataset
    questions = compute_answer_spans_par(questions, corpus, tokenizer, answer_detector, n_process)
  File "/home/seohyeong/Dropbox/Naver/RE3QA-master/triviaqa/answer_detection.py", line 99, in compute_answer_spans_par
    [[c, corpus, tokenizer, detector] for c in chunks]))
  File "/home/seohyeong/anaconda3/lib/python3.6/multiprocessing/pool.py", line 274, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/home/seohyeong/anaconda3/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
ValueError
huminghao16 commented 5 years ago

That's weird. I haven't encountered such problem. Maybe you can change the hyperparameter --n_processes to see if it works.

Gabriellamin commented 5 years ago

I have also encountered the problem.

Seohyeong commented 5 years ago

I've figured at the exact breakpoint where the error had been called, the name of the wiki document is non-English (meaning French with an accent on top of the alphabet). I'm thinking, this error has to do with non-English letters. I'm guessing default language setting such as CP949, utf-8, etc... on your computer is different from mine. Did you specify any encoding type? if haven't what types of encoding are you using?

huminghao16 commented 5 years ago

The default encoding type of my computer is 'en_US.UTF-8'.

Gabriellamin commented 5 years ago

I changed my default encoding type into 'en_US.UTF-8' as the author suggested,but the error still existed.