nlpyang / PreSumm

code for EMNLP 2019 paper Text Summarization with Pretrained Encoders
MIT License
1.28k stars 464 forks source link

Exception during preprocess data #103

Open cuthbertjohnkarawa opened 4 years ago

cuthbertjohnkarawa commented 4 years ago

The tokenized stories directory /data/presum/data/preprocess contains 0 files, but it should contain the same number as /data/presum/data/raw_stories (which has 136008 files). Was there an error during tokenization?

Shanzaay commented 4 years ago

I have this library in my code:

from pyrouge.utils import log

But I am getting the following error:

ModuleNotFoundError: No module named 'pyrouge.utils'; 'pyrouge' is not a package

Though pyrouge in already installed in my system. I tried to find its solution but there are like none results. Can anyone help me here?

cuthbertjohnkarawa commented 4 years ago

pip install pyrouge pyrouge_set_rouge_path /absolute/path/to/ROUGE-1.5.5/directory python -m pyrouge.test

I have this library in my code:

from pyrouge.utils import log

But I am getting the following error:

ModuleNotFoundError: No module named 'pyrouge.utils'; 'pyrouge' is not a package

Though pyrouge in already installed in my system. I tried to find its solution but there are like none results. Can anyone help me here?

try this

Shanzaay commented 4 years ago

pip install pyrouge pyrouge_set_rouge_path /absolute/path/to/ROUGE-1.5.5/directory python -m pyrouge.test

I have this library in my code: from pyrouge.utils import log But I am getting the following error: ModuleNotFoundError: No module named 'pyrouge.utils'; 'pyrouge' is not a package Though pyrouge in already installed in my system. I tried to find its solution but there are like none results. Can anyone help me here?

try this While running this command i am getting this error: command : pyrouge_set_rouge_path /absolute/path/to/ROUGE-1.5.5/directory error: 'pyrouge_set_rouge_path' is not recognized as an internal or external command, operable program or batch file.

cuthbertjohnkarawa commented 4 years ago

uninstall everything then start again

cuthbertjohnkarawa commented 4 years ago

uninstall everything then start again

git clone https://github.com/bheinzerling/pyrouge cd pyrouge pip install -e . git clone https://github.com/andersjo/pyrouge.git rouge pyrouge_set_rouge_path ~/pyrouge/rouge/tools/ROUGE-1.5.5/ cd rouge/tools/ROUGE-1.5.5/data python -m pyrouge.test

Shanzaay commented 4 years ago

Thank you. I am gonna try that. And then share my result.

On Fri, 17 Jan 2020, 9:50 AM Cuthbert, notifications@github.com wrote:

uninstall everything then start again

git clone https://github.com/bheinzerling/pyrouge cd pyrouge pip install -e . git clone https://github.com/andersjo/pyrouge.git rouge pyrouge_set_rouge_path ~/pyrouge/rouge/tools/ROUGE-1.5.5/ cd rouge/tools/ROUGE-1.5.5/data python -m pyrouge.test

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nlpyang/PreSumm/issues/103?email_source=notifications&email_token=AIFUT5CGT44L63MGZX3FOULQ6E2J5A5CNFSM4J6QS2N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJGOBRY#issuecomment-575463623, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIFUT5BKVYMIDDUNIYNEANTQ6E2J5ANCNFSM4J6QS2NQ .

Shanzaay commented 4 years ago

pyrouge_set_rouge_path ~/pyrouge/rouge/tools/ROUGE-1.5.5/

When using this command I am getting this. Is it what suppose to be appear or something is wrong: 2020-01-18 21:08:10,347 [MainThread ] [INFO ] Set ROUGE home directory to /home/natsu/pyrouge/rouge/tools/ROUGE-1.5.5/.

Shanzaay commented 4 years ago

set pyrouge_set_rouge_path ~/pyrouge/rouge/tools/ROUGE-1.5.5/

I used this command. now there is no error but nothing else is displaying either. when i run the last command you mention its displaying error: Ran 10 tests in 1.266s

FAILED (errors=3)

Plus can you tell me what is pyrouge.test as there is no file in data folder name pyrouge.test.

Shanzaay commented 4 years ago

https://stackoverflow.com/questions/45894212/installing-pyrouge-gets-error-in-ubuntu

i used solution number 2 and i got Ok now. But how do i import the pyrouge now. Because i am getting this error: ModuleNotFoundError: No module named 'pyrouge.utils'

fabrahman commented 4 years ago

The tokenized stories directory /data/presum/data/preprocess contains 0 files, but it should contain the same number as /data/presum/data/raw_stories (which has 136008 files). Was there an error during tokenization?

The tokenized stories directory /data/presum/data/preprocess contains 0 files, but it should contain the same number as /data/presum/data/raw_stories (which has 136008 files). Was there an error during tokenization?

I have the same issue. But the following comments are not related to this issue. Can you share how you solved your issue?

Shanzaay commented 4 years ago

The tokenized stories directory /data/presum/data/preprocess contains 0 files, but it should contain the same number as /data/presum/data/raw_stories (which has 136008 files). Was there an error during tokenization?

The tokenized stories directory /data/presum/data/preprocess contains 0 files, but it should contain the same number as /data/presum/data/raw_stories (which has 136008 files). Was there an error during the tokenization?

I have the same issue. But the following comments are not related to this issue. Can you share how you solved your issue?

You have to export this library first. export CLASSPATH=/path/to/stanford-corenlp-full-2017-06-09/stanford-corenlp-3.8.0.jar

Then you will not get this error.

It means before tokenization you have to run the above instruction. See the readme file. They provide the link where you can download this corenlp library.

If you are using windows then i think instead of export you have to use SET . I hope this will help.

fabrahman commented 4 years ago

The tokenized stories directory /data/presum/data/preprocess contains 0 files, but it should contain the same number as /data/presum/data/raw_stories (which has 136008 files). Was there an error during tokenization?

The tokenized stories directory /data/presum/data/preprocess contains 0 files, but it should contain the same number as /data/presum/data/raw_stories (which has 136008 files). Was there an error during the tokenization?

I have the same issue. But the following comments are not related to this issue. Can you share how you solved your issue?

You have to export this library first. export CLASSPATH=/path/to/stanford-corenlp-full-2017-06-09/stanford-corenlp-3.8.0.jar

Then you will not get this error.

It means before tokenization you have to run the above instruction. See the readme file. They provide the link where you can download this corenlp library.

If you are using windows then i think instead of export you have to use SET . I hope this will help.

@Shanzaay Thanks. I actually followed the readme and I exported that path and still when running the 3rd preprocessing step (tokenization) I get the following full error:

PreSumm/src$ python preprocess.py -mode tokenize -raw_path ../my_data/train -save_path ../my_data/train_tok/ -log_file ../my_data/logs/train.log
I0222 22:19:06.811911 140080766781248 file_utils.py:41] PyTorch version 1.2.0 available.
Preparing to tokenize /home/PreSumm/my_data/train to /home/PreSumm/my_data/train_tok...
Making list of files to tokenize...
Tokenizing 4851 files in /home/PreSumm/my_data/train and saving in /home/PreSumm/my_data/train_tok...
Adding annotator tokenize
Adding annotator ssplit

Stanford CoreNLP Tokenizer has finished.
Traceback (most recent call last):
  File "preprocess.py", line 73, in <module>
    eval('data_builder.'+args.mode + '(args)')
  File "<string>", line 1, in <module>
  File "/home/PreSumm/src/prepro/data_builder.py", line 137, in tokenize
    tokenized_stories_dir, num_tokenized, stories_dir, num_orig))
Exception: The tokenized stories directory /home/PreSumm/my_data/train_tok contains 0 files, but it should contain the same number as /home/PreSumm/my_data/train (which has 4851
files). Was there an error during tokenization?
Shanzaay commented 4 years ago

Show me the instructions how you export corenlp library.

On Sun, 23 Feb 2020, 8:52 PM HannahB, notifications@github.com wrote:

The tokenized stories directory /data/presum/data/preprocess contains 0 files, but it should contain the same number as /data/presum/data/raw_stories (which has 136008 files). Was there an error during tokenization?

The tokenized stories directory /data/presum/data/preprocess contains 0 files, but it should contain the same number as /data/presum/data/raw_stories (which has 136008 files). Was there an error during the tokenization?

I have the same issue. But the following comments are not related to this issue. Can you share how you solved your issue?

You have to export this library first. export CLASSPATH=/path/to/stanford-corenlp-full-2017-06-09/stanford-corenlp-3.8.0.jar

Then you will not get this error.

It means before tokenization you have to run the above instruction. See the readme file. They provide the link where you can download this corenlp library.

If you are using windows then i think instead of export you have to use SET . I hope this will help.

@Shanzaay https://github.com/Shanzaay Thanks. I actually followed the readme and I exported that path and still when running the 3rd preprocessing step (tokenization) I get the following full error:

PreSumm/src$ python preprocess.py -mode tokenize -raw_path ../my_data/train -save_path ../my_data/train_tok/ -log_file ../my_data/logs/train.log I0222 22:19:06.811911 140080766781248 file_utils.py:41] PyTorch version 1.2.0 available. Preparing to tokenize /home/PreSumm/my_data/train to /home/PreSumm/my_data/train_tok... Making list of files to tokenize... Tokenizing 4851 files in /home/PreSumm/my_data/train and saving in /home/PreSumm/my_data/train_tok... Adding annotator tokenize Adding annotator ssplit

Stanford CoreNLP Tokenizer has finished. Traceback (most recent call last): File "preprocess.py", line 73, in eval('data_builder.'+args.mode + '(args)') File "", line 1, in File "/home/PreSumm/src/prepro/data_builder.py", line 137, in tokenize tokenized_stories_dir, num_tokenized, stories_dir, num_orig)) Exception: The tokenized stories directory /home/PreSumm/my_data/train_tok contains 0 files, but it should contain the same number as /home/PreSumm/my_data/train (which has 4851 files). Was there an error during tokenization?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nlpyang/PreSumm/issues/103?email_source=notifications&email_token=AIFUT5BEXNQNZ3I2SE6KLULREKLT7A5CNFSM4J6QS2N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMV7GWA#issuecomment-590082904, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIFUT5ECZOX3PNRXSQBMSVLREKLT7ANCNFSM4J6QS2NQ .

fabrahman commented 4 years ago

@Shanzaay Here it is:

export CLASSPATH=/home/stanford-corenlp-full-2018-10-05/stanford-corenlp-3.9.2.jar

Note that I used the current available version of corenlp on the link provided.

Shanzaay commented 4 years ago

are the file in data folder have .story extension? if you are importing nlp library correctly the error should not generate. i would suggest you to use the same library as the authors use in the code.

On Sun, Feb 23, 2020 at 10:45 PM HannahB notifications@github.com wrote:

@Shanzaay https://github.com/Shanzaay Here it is:

export CLASSPATH=/home/stanford-corenlp-full-2018-10-05/stanford-corenlp-3.9.2.jar

Note that I used the current available version of corenlp on the link provided.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nlpyang/PreSumm/issues/103?email_source=notifications&email_token=AIFUT5A37SOSTR5QBFFCJYLREKY2ZA5CNFSM4J6QS2N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMWCKGA#issuecomment-590095640, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIFUT5C5N4EBRJZW4D74Y5DREKY2ZANCNFSM4J6QS2NQ .

zhaoguangxiang commented 4 years ago

git clone https://github.com/bheinzerling/pyrouge cd pyrouge pip3 install -e . --user git clone https://github.com/andersjo/pyrouge.git rouge export CLASSPATH=/home/zhaoguangxiang/stanford-corenlp-full-2018-10-05/stanford-corenlp-3.9.2.jar pyrouge_set_rouge_path ~/pyrouge/rouge/tools/ROUGE-1.5.5/ sudo apt-get install libxml-parser-perl cd rouge/tools/ROUGE-1.5.5/data rm WordNet-2.0.exc.db ./WordNet-2.0-Exceptions/buildExeptionDB.pl ./WordNet-2.0-Exceptions ./smart_common_words.txt ./WordNet-2.0.exc.db python3 -m pyrouge.test

AanchalA commented 4 years ago

The tokenized stories directory /data/presum/data/preprocess contains 0 files, but it should contain the same number as /data/presum/data/raw_stories (which has 136008 files). Was there an error during tokenization?

Hey, I'm using a Windows OS and so I Set the CLASSPATH in environment variables. But im still getting the same error. Also, in Step 1, when we have to download stories from the following link, https://cs.nyu.edu/~kcho/DMQA/, I get a .tgz file when I click on stories instead of .zip that would contain .story files. Am I doing something worng?

chandanrao007 commented 4 years ago

The tokenized stories directory /data/presum/data/preprocess contains 0 files, but it should contain the same number as /data/presum/data/raw_stories (which has 136008 files). Was there an error during tokenization?

Hey, I'm using a Windows OS and so I Set the CLASSPATH in environment variables. But im still getting the same error. Also, in Step 1, when we have to download stories from the following link, https://cs.nyu.edu/~kcho/DMQA/, I get a .tgz file when I click on stories instead of .zip that would contain .story files. Am I doing something worng?

To unzip files use this cmd ! tar xvf cnn_stories.tgz

chandanrao007 commented 4 years ago

are the file in data folder have .story extension? if you are importing nlp library correctly the error should not generate. i would suggest you to use the same library as the authors use in the code. On Sun, Feb 23, 2020 at 10:45 PM HannahB @.***> wrote: @Shanzaay https://github.com/Shanzaay Here it is: export CLASSPATH=/home/stanford-corenlp-full-2018-10-05/stanford-corenlp-3.9.2.jar Note that I used the current available version of corenlp on the link provided. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#103?email_source=notifications&email_token=AIFUT5A37SOSTR5QBFFCJYLREKY2ZA5CNFSM4J6QS2N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMWCKGA#issuecomment-590095640>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIFUT5C5N4EBRJZW4D74Y5DREKY2ZANCNFSM4J6QS2NQ .

Yes, they have the .story extension and I am using the same library which the author used. stanford-corenlp-full-2017-06-09/stanford-corenlp-3.8.0.jar. Still I am getting the same error. I am using google colab notebook.

chandanrao007 commented 4 years ago

@Shanzaay Here it is:

export CLASSPATH=/home/stanford-corenlp-full-2018-10-05/stanford-corenlp-3.9.2.jar

Note that I used the current available version of corenlp on the link provided.

Hey, did you solve the issue?

germanenik commented 3 years ago

had the same issue, don't forget to restart your terminal after u add the path to your bash!

WSChange commented 8 months ago

@Shanzaay Here it is:

export CLASSPATH=/home/stanford-corenlp-full-2018-10-05/stanford-corenlp-3.9.2.jar

Note that I used the current available version of corenlp on the link provided.

Hey, did you solve the issue?

I have solved this issue. When you unzip the .zip file, put the .story file under the row_stories file.