memray / seq2seq-keyphrase

MIT License
318 stars 109 forks source link

Issue when loading test data #10

Closed HagopB closed 6 years ago

HagopB commented 6 years ago

Hello,

I'm trying to run the code and extract keyphrases on at least on of the provided datasets. There is apparently an issue when reading the test sets (I have tried for several datasets and the error still persists), here is what I get:

1/05/2018 13:10:27 [INFO] core: load the weights.
Loading testing dataset INSPEC from /home/ubuntu/work/nlp/seq2seq-keyphrase/dataset/keyphrase/testing-data/INSPEC
inspec
Size of test data=0
/home/ubuntu/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:1110: RuntimeWarning: Mean of empty slice.
  avg = a.mean(axis)
Traceback (most recent call last):
  File "keyphrase/keyphrase_copynet.py", line 530, in <module>
    print('Avg length=%d, Max length=%d' % (np.average([len(s) for s in test_set['source']]), np.max([len(s) for s in test_set['source']])))
  File "/home/ubuntu/.local/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 2272, in amax
    out=out, **kwargs)
  File "/home/ubuntu/.local/lib/python2.7/site-packages/numpy/core/_methods.py", line 26, in _amax
    return umr_maximum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation maximum which has no identity

I am wondering why the size of the test data is 0. I have downloaded the zip file and unzipped it in the main folder of the project and strictly followed the instructions. I have tried for few days now to try to figure out what is wrong. Did you meet this issue ?

Sincere thanks.

memray commented 6 years ago

Can you check the line 59 of the config.py? It should be like: config['testing_datasets']= ['nus'] # 'inspec', 'nus', 'semeval', 'krapivin', 'kp20k'

I modified it to some name non-existing recently and it may cause the error.

rafaelbou commented 6 years ago

Hi, I have the same issue when I try to use the code for extraction. It repeats for 'nus' and 'inspec' data-sets.


The error: Loading testing dataset INSPEC from /home/student-5/PycharmProjects/rafael/seq2seq-keyphrase-master/dataset/keyphrase/testing-data/INSPEC /usr/local/lib/python2.7/dist-packages/numpy/lib/function_base.py:1110: RuntimeWarning: Mean of empty slice. avg = a.mean(axis) Traceback (most recent call last): File "/home/student-5/PycharmProjects/rafael/seq2seq-keyphrase-master/keyphrase/keyphrase_copynet.py", line 530, in print('Avg length=%d, Max length=%d' % (np.average([len(s) for s in test_set['source']]), np.max([len(s) for s in test_set['source']]))) File "/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 2272, in amax out=out, **kwargs) File "/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py", line 26, in _amax return umr_maximum(a, axis, None, out, keepdims) ValueError: zero-size array to reduction operation maximum which has no identity inspec Size of test data=0

Process finished with exit code 1


Config file line 59: config['testing_datasets']= ['inspec'] # 'inspec', 'nus', 'semeval', 'krapivin', 'kp20k'


Thanks, Rafael.


Update: find the problem, keyphrase_test_datasets.py, line: 218 keyphrase_filepaths = [self.keyphrasedir + n for n in os.listdir(self.keyphrasedir) if n.endswith('.txt')] When the files in gold_standard_keyphrases folder endswith ".keyphrases"

memray commented 6 years ago

Hi Rafael,

Did you check if files do exist in "/home/student-5/PycharmProjects/rafael/seq2seq-keyphrase-master/dataset/keyphrase/testing-data/INSPEC"?

Rui