williamleif / socialsent

Code and data for inducing domain-specific sentiment lexicons.
Apache License 2.0
195 stars 76 forks source link

IOError of Lexicons #6

Closed ndrmahmoudi closed 7 years ago

ndrmahmoudi commented 7 years ago

Hi again William,

I appreciate your quick responses. Regarding the lexicons, I got following error:

In [1]: %run example.py
Using TensorFlow backend.
Evaluting SentProp with 100 dimensional GloVe embeddings
Evaluting only binary classification performance on General Inquirer lexicon
---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
/socialsent-master/example.py in <module>()
      8     print "Evaluting SentProp with 100 dimensional GloVe embeddings"
      9     print "Evaluting only binary classification performance on General Inquirer lexicon"
---> 10     lexicon = lexicons.load_lexicon("inquirer", remove_neutral=True)
     11     pos_seeds, neg_seeds = seeds.hist_seeds()
     12     embeddings = create_representation("GIGA", "data/example_embeddings/glove.6B.100d.txt",

/socialsent-master/socialsent/lexicons.pyc in load_lexicon(name, remove_neutral)
    163 
    164 def load_lexicon(name=constants.LEXICON, remove_neutral=True):
--> 165     lexicon = util.load_json(constants.PROCESSED_LEXICONS + name + '.json')
    166     return {w: p for w, p in lexicon.iteritems() if p != 0} if remove_neutral else lexicon
    167 

/socialsent-master/socialsent/util.pyc in load_json(fname)
     32 
     33 def load_json(fname):
---> 34     with open(fname) as f:
     35         return json.loads(f.read())
     36 

IOError: [Errno 2] No such file or directory: '/afs/cs.stanford.edu/u/wleif/sentiment/polarity_induction/data/lexicons/inquirer.json'

I know that I have to play with paths in constant.py file. I just wanted to let you know about the bug.

Regards, Nader

williamleif commented 7 years ago

Thanks for mentioning this! I made the default DATA path relative so that you don't need to play with the constants to load the lexicons (assuming you are running commands from the root directory of the repo).

Of course, you still need to download the embeddings and set those paths manually.

ndrmahmoudi commented 7 years ago

I still get the same error. I am sure that I running from root. When I install the package using pip, it does not create any folder for lexicons or representations. There, I get the error.

williamleif commented 7 years ago

Ahh I see. The pip installation was indeed messed up. The pip installation should work for the lexicon data now, as I have made that data a part of the package distribution. However, you will still need to modify constants.py for the embedding paths (and if you install via pip, then you need to make this modification wherever pip installed the source). The README now recommends the python setup.py install route rather than pip install due to this mess.

Ideally, the user should be able to pip install and then run a command to download the default word vectors, but this is a todo enhancement at this point.

ndrmahmoudi commented 7 years ago

I was trying to modify the constant.py for the embedding paths and realised that following folders are missed from the data folder:

LEXICONS = DATA + 'lexicon_info/'
PROCESSED_LEXICONS = DATA + 'lexicons/'
POLARITIES = DATA + 'polarities/'

The data folder in github contains just the lexicons.

williamleif commented 7 years ago

Whoops, this minor follow-up slipped under my radar. I removed the LEXICONS folder because it contains info that is not useful to downstream users. The POLARITIES folder is just a default output folder (e.g., for the code in the historical dir) that should be automatically created. There is no actual data in it by default. I modified the constants.py so that the code automatically checks if that directory exists and creates it if necessary on an import.