ngageoint / Nounalyzer

Analyze the nouns and entities in a rss feed
The Unlicense
21 stars 3 forks source link

Python Dependency Versions #1

Open jeremysnyder opened 7 years ago

jeremysnyder commented 7 years ago

I have been attempting to stand this service up locally, and worked through the Python dependencies, but am getting no NLP processing of the data, reading in feeds from Google News RSS. I tried stepping through the code in a python shell, to determine if spaCy was actually getting the text, and it seems to be getting it, but not pulling any information from it other than tokens.

One thing I was wondering was what versions of the Python libraries you are using. Wasn't sure if I might be using a newer version that isn't compatible with this code. I have a pip requirements.txt file that looks like this right now:

feedparser
goose-extractor
spacy

but would like to attach versions to them. Once I am sure I am running the proper versions of the dependencies, I can then better troubleshoot the problems. Any help is appreciated here.

jeremysnyder commented 7 years ago

After reading through the spaCy documentation, I found that I needed to download a model file for spaCy to run against, so I chose the English file and used python -m spacy download en. This got me to a point where it is definitely processing the Google News feeds, but other errors occur at this point, which I haven't fully tracked down, but they seem to be when they are trying to bucket words in categories in the script.

I am probably going to leave this where it is for now. I might submit a pull request that adds a Dockerfile to the repository, so that folks can just clone this and run it via Docker.

Still, any information here on dependency versions (if they even matter) would be helpful for me to add to my requirements.txt file so that those are accurate when running Nounalyzer.