hosseinfani / ReQue

A Benchmark Workflow and Dataset Collection for Query Refinement
https://hosseinfani.github.io/ReQue/
Other
25 stars 0 forks source link

Errors in main.py when runnning for the first time #1

Closed Narabzad closed 3 years ago

Narabzad commented 4 years ago

I followed the instructions in the readme and while running main.py for the first time I got the following error : from cmn import expander_factory as ef File "/home/negar/ReQue2/ReQue/qe/cmn/expander_factory.py", line 6, in <module> from expanders.sensedisambiguation import SenseDisambiguation File "/home/negar/ReQue2/ReQue/qe/expanders/sensedisambiguation.py", line 1, in <module> from pywsd import disambiguate ModuleNotFoundError: No module named 'pywsd' I think pywsdshould be added to requirements of ReQue environment. The same story is true for community package as well.

After installing pywsd and community package, I got the following error :

Traceback (most recent call last): File "main.py", line 35, in <module> from cmn import expander_factory as ef File "/home/negar/ReQue2/ReQue/qe/cmn/expander_factory.py", line 14, in <module> from expanders.termluster import Termluster File "/home/negar/ReQue2/ReQue/qe/expanders/termluster.py", line 3, in <module> from community import community_louvain ImportError: cannot import name 'community_louvain' from 'community' (/home/negar/anaconda3/lib/python3.7/site-packages/community/__init__.py)

hosseinfani commented 4 years ago

pywsd, community, and python-louvain had been already listed both in the ReadMe and the environment.yml. Please check and let me know if it does not answer your issue.

Narabzad commented 4 years ago

They had not been installed by creating the ReQue condo environment. After installing the requirements manually I was able to run the code. Now I am getting the following error : FileNotFoundError: [Errno 2] No such file or directory: '../ds/robust04/topics.robust04.txt' Is this file supposed to be on Github?

hosseinfani commented 4 years ago

They had not been installed by creating the ReQue condo environment. After installing the requirements manually I was able to run the code.

Please re-create (delete and create again) the ReQue environment and post the conda output here to see why those libraries cannot be installed.

Now I am getting the following error : FileNotFoundError: [Errno 2] No such file or directory: '../ds/robust04/topics.robust04.txt' Is this file supposed to be on Github?

No, this is not part of our github repo as it is not present in ./ds/robust04. The link to download the file has been provided in the readme however. Please check.

kpoots commented 4 years ago

Hi Hossein,

I am making some progress with ReQue. You have covered many bases with this code.

I am having a problem with input datasets.

From existing documentation about the files that go in the ../pre directory:

+---pre | # anchor_text_en.ttl - from http://downloads.dbpedia.org/2016-10/core-i18n/en/anchor_text_en.ttl.bz2 | # gitkeep - tells git to keep the directory when empty | # glove.6B.300d.txt - GloVe embeddings from http://nlp.stanford.edu/data/glove.6B.zip | # temp_model_Wiki | # temp_model_Wiki.vectors.npy | # wiki-anchor-text-en-ttl-300d.vec | # wiki-anchor-text-en-ttl-300d.vec.vectors.npy | # wiki-news-300d-1M.vec - fastText embeddings from https://dl.fbaipublicfiles.com/fasttext/ vectors-english/wiki-news-300d-1M.vec.zip

This includes my understanding of where these files come from.

It seems that some of these files are generated by code, rather than loaded into the ../pre directory.

I don't see how to generate those files. Could you please help ?

Thank you !

Kent

hosseinfani commented 4 years ago

Hi Hossein,

I am making some progress with ReQue. You have covered many bases with this code.

I am having a problem with input datasets.

From existing documentation about the files that go in the ../pre directory:

+---pre | # anchor_text_en.ttl - from http://downloads.dbpedia.org/2016-10/core-i18n/en/anchor_text_en.ttl.bz2 | # gitkeep - tells git to keep the directory when empty | # glove.6B.300d.txt - GloVe embeddings from http://nlp.stanford.edu/data/glove.6B.zip | # temp_model_Wiki | # temp_model_Wiki.vectors.npy | # wiki-anchor-text-en-ttl-300d.vec | # wiki-anchor-text-en-ttl-300d.vec.vectors.npy | # wiki-news-300d-1M.vec - fastText embeddings from https://dl.fbaipublicfiles.com/fasttext/ vectors-english/wiki-news-300d-1M.vec.zip

This includes my understanding of where these files come from.

It seems that some of these files are generated by code, rather than loaded into the ../pre directory.

I don't see how to generate those files. Could you please help ?

Thank you !

Kent

Thanks and you are right! Those embedding are either provided to us by other authors (e.g., temp_model_Wiki) or have to be built by our code (e.g., wiki-anchor-text-en-ttl-300d.vec). I will schedule this issue to be addressed. For now, you can bypass building expanders that use those missing embedding files by commenting the lines in expander_factory.py:

https://github.com/hosseinfani/ReQue/blob/df3bcbdc3189a936f39ade0743450b7871e35517/qe/cmn/expander_factory.py#L40 https://github.com/hosseinfani/ReQue/blob/df3bcbdc3189a936f39ade0743450b7871e35517/qe/cmn/expander_factory.py#L41 https://github.com/hosseinfani/ReQue/blob/df3bcbdc3189a936f39ade0743450b7871e35517/qe/cmn/expander_factory.py#L49 https://github.com/hosseinfani/ReQue/blob/df3bcbdc3189a936f39ade0743450b7871e35517/qe/cmn/expander_factory.py#L50

We will provide more help on how to populate those embedding files soon, e.g., by providing a link to download the embedding file (temp_model_Wiki) or a manual on how to train an embedding vectors (in case of wiki-anchor-text-en-ttl-300d.vec)

Narabzad commented 4 years ago

They had not been installed by creating the ReQue condo environment. After installing the requirements manually I was able to run the code.

Please re-create (delete and create again) the ReQue environment and post the conda output here to see why those libraries cannot be installed.

Now I am getting the following error : FileNotFoundError: [Errno 2] No such file or directory: '../ds/robust04/topics.robust04.txt' Is this file supposed to be on Github?

No, this is not part of our github repo as it is not present in ./ds/robust04. The link to download the file has been provided in the readme however. Please check.

I re-created the environment and tried to follow read me instructions for installing ReQue. There is no installation error any more However, still, there are files missing that I am not sure where in read me we mentioned to download them from Google Drive?

Here is the error : FileNotFoundError: [Errno 2] No such file or directory: '../ds/qe/robust04/topics.robust04.abstractqueryexpansion.txt'

hosseinfani commented 4 years ago

This file should be generated through the flow when the AbstractQExpander as the identity expander (ie. q'=q) is executed. This is the first expander in the list and all other expanders depend on this. When this file is missing, it means that there is some error before this error when running the AbstractQExpander. Please share the whole output here or in a log file so I can investigate the problem. Thanks.

Narabzad commented 4 years ago

I did not put the original topic file in ds directory... Now It is fixed . However, I am getting the following error when it comes to thesaurus expander : Traceback (most recent call last): File "/home/negar/ReQue2/ReQue/qe/expanders/thesaurus.py", line 49, in get_synonym soup = BeautifulSoup(html, 'lxml') File "/home/negar/anaconda3/envs/ReQue/lib/python3.7/site-packages/bs4/__init__.py", line 245, in __init__ % ",".join(features)) bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

Narabzad commented 4 years ago

Another error found in expanders: Traceback (most recent call last): File "/home/negar/ReQue2/ReQue/qe/expanders/abstractqexpander.py", line 38, in write_expanded_queries q_ = self.get_expanded_query(q, [qid]) File "/home/negar/ReQue2/ReQue/qe/expanders/stem.py", line 15, in get_expanded_query return self.stemmer.stem_query(q) File "/home/negar/ReQue2/ReQue/qe/stemmers/abstractstemmer.py", line 21, in stem_query processed_words = self.process(clean_words) File "/home/negar/ReQue2/ReQue/qe/stemmers/krovetz.py", line 21, in process new_words += subprocess.check_output('java -jar ' + self.jarfile + ' -w ' + ' '.join(words), shell=True, ).split() File "/home/negar/anaconda3/envs/ReQue/lib/python3.7/subprocess.py", line 411, in check_output **kwargs).stdout File "/home/negar/anaconda3/envs/ReQue/lib/python3.7/subprocess.py", line 512, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command 'java -jar stemmers/kstem-3.4.jar -w r & d drug prices' returned non-zero exit status 127.

Narabzad commented 4 years ago

Also, how do one should download the Joint Embedding of Hierarchical Categories... pre-trained model? FileNotFoundError: [Errno 2] No such file or directory: '../pre/temp_model_Wiki'

hosseinfani commented 4 years ago

Another error found in expanders: Traceback (most recent call last): File "/home/negar/ReQue2/ReQue/qe/expanders/abstractqexpander.py", line 38, in write_expanded_queries q_ = self.get_expanded_query(q, [qid]) File "/home/negar/ReQue2/ReQue/qe/expanders/stem.py", line 15, in get_expanded_query return self.stemmer.stem_query(q) File "/home/negar/ReQue2/ReQue/qe/stemmers/abstractstemmer.py", line 21, in stem_query processed_words = self.process(clean_words) File "/home/negar/ReQue2/ReQue/qe/stemmers/krovetz.py", line 21, in process new_words += subprocess.check_output('java -jar ' + self.jarfile + ' -w ' + ' '.join(words), shell=True, ).split() File "/home/negar/anaconda3/envs/ReQue/lib/python3.7/subprocess.py", line 411, in check_output **kwargs).stdout File "/home/negar/anaconda3/envs/ReQue/lib/python3.7/subprocess.py", line 512, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command 'java -jar stemmers/kstem-3.4.jar -w r & d drug prices' returned non-zero exit status 127.

Seems the error is due to the jar file for krovetz stemmer. Would you please do the followings: 1) Check the kstem-3.4.jar file exists in 'stemmers' subfolder 2) Check the jar file is running correctly by running this: "java -jar stemmers/kstem-3.4.jar -w {any sentence/query}" I am not sure the jar file supports any type of sentence in its input. For instance, a sentence including space or special chars. Please try different samples and figure out whether the problem is with the way we put in the sentence or sth else.

Please let me know

hosseinfani commented 4 years ago

Also, how do one should download the Joint Embedding of Hierarchical Categories... pre-trained model? FileNotFoundError: [Errno 2] No such file or directory: '../pre/temp_model_Wiki'

By contacting the authors of the paper (as we did). If we have the authors' permission, we can release the file in a publicly available repo and provide the link. Please communicate with the authors to ask for permission.

@ebrahim-bagheri

Narabzad commented 3 years ago

2. java -jar stemmers/kstem-3.4.jar -w {any sentence/query

  1. Yes, The jar file was there.
  2. This command works properly and it stems the input sentences.
hosseinfani commented 3 years ago
  1. java -jar stemmers/kstem-3.4.jar -w {any sentence/query
  1. Yes, The jar file was there.
  2. This command works properly and it stems the input sentences.

I ran this "java -jar stemmers/kstem-3.4.jar -w r & d drug prices" as it was in the output error. The problem is with the '&' because Linux shell interprets whatever before & as a background task.

we do remove clean the expanded queries by utils.clean() but not the original query.

We can clean the original query before calling any expanders here.

Or we can only fix this expander.

What do you think?

kpoots commented 3 years ago

Please clean the original query. That would seem to give a good starting point for further work. I will post my notes about using my own data soon. The larger issue is that ReQue is complicated, and going somewhat out of band can be risky. As an alternative to running things in the background, they can be run in the foreground and the output redirected to a file. Then you can start another session and use tail -f on that file to see how things develop.

hosseinfani commented 3 years ago
  1. java -jar stemmers/kstem-3.4.jar -w {any sentence/query
  1. Yes, The jar file was there.
  2. This command works properly and it stems the input sentences.

I fixed the issue by cleaning the query for stemmers only here If nothing else, please close this issue.

kpoots commented 3 years ago

I would like to understand why so many stemmers at some point. But that is not an issue, just curiosity. And I am being lazy. I need to read the papers. Good work !