Closed MINIMALaq closed 3 years ago
Hello, Thank you very much for this repo. I tried to run
lda_analysis
notebook and it gave me lots of import error. Especially it seems because of removing wrappers from gensim, it make lots of error even when I used gensim==3.8.3 in the 'topic_modeling_gensim.py' you used 'from gensim.models.wrappers import LdaMallet' which needs 'from gensim.models.word2vec import Vocab'. And I couldnt handle this error: 'ImportError: cannot import name 'Vocab' from 'gensim.models.word2vec' . Can you please run this notebook in a virtual env and update the requirement.txt.
Thanks for bringing this to my attention. I updated the requirements.txt
and added the packages used in jupyter notebooks. However, I'm not sure if the error you mentioned will be resolved by/is related to installing the newly added packages in the requirements file. Are you sure you've properly installed Mallet? If not, you may first want to take a look here and here. Please let me know if you still have a problem after installing Mallet (I'll update the readme too to make sure I mention Mallet installation requirmenet).
The requirement has an issue:
The conflict is caused by: The user requested numpy==1.18.1 gensim 3.8.0 depends on numpy>=1.11.3 yellowbrick 1.1 depends on numpy>=1.13.0 matplotlib 3.1.3 depends on numpy>=1.11 scikit-learn 0.24.2 depends on numpy>=1.13.3 bokeh 2.0.2 depends on numpy>=1.11.3 pyldavis 3.3.1 depends on numpy>=1.20.0
I commented pandas and numpy and it works.
in the lda_analysis
I changed import pyLDAvis.gensim
to import pyLDAvis.gensim_models
and it seems work!
Next issue is data/cleaned folder which is not included in the repo. I am wondering if I need to run another code to clean data and fill this folder before using lda_analysis
?
For Mallet I used brew. the binary in mac goes to "/opt/homebrew/Cellar/mallet/2.0.8_1/bin/mallet". I tested it and it works.
The requirement has an issue:
The conflict is caused by: The user requested numpy==1.18.1 gensim 3.8.0 depends on numpy>=1.11.3 yellowbrick 1.1 depends on numpy>=1.13.0 matplotlib 3.1.3 depends on numpy>=1.11 scikit-learn 0.24.2 depends on numpy>=1.13.3 bokeh 2.0.2 depends on numpy>=1.11.3 pyldavis 3.3.1 depends on numpy>=1.20.0
I commented pandas and numpy and it works. in the
lda_analysis
I changedimport pyLDAvis.gensim
toimport pyLDAvis.gensim_models
and it seems work!Next issue is data/cleaned folder which is not included in the repo. I am wondering if I need to run another code to clean data and fill this folder before using
lda_analysis
?For Mallet I used brew. the binary in mac goes to "/opt/homebrew/Cellar/mallet/2.0.8_1/bin/mallet". I tested it and it works.
The reason for not including the data/cleaned
folder is that it requires preprocessing of the raw tweets that we cannot share for now. However, assuming you have the full json
files of tweets (after downloading the tweets using the IDs we have shared), you can put the json files in the data/input
folder, then use the methods from PreProcessing
class in the pre_processing.py and convert the raw json/excel files of tweets to a cleaned/filtered subset that will be the input for topic modeling.
There are two points to consider when using the preprocessing methods: 1) some of these methods may be based on an older version of Twitter API, so you may want to double-check the names and format of the fields in the tweet object to avoid getting key errors or formatting issues. 2) some of the preprocessing codes are based on the format of the files we receive from the server with which we download the tweets (JSON files are normally the same format as the Twitter API, Excel files might have some minor differences). So, in case you could not run the exact code, you may want to just follow the logic we used for filtering/cleaning the tweets.
Thanks Pedram jan.
Hello, Thank you very much for this repo. I tried to run
lda_analysis
notebook and it gave me lots of import error. Especially it seems because of removing wrappers from gensim, it make lots of error even when I used gensim==3.8.3 in the 'topic_modeling_gensim.py' you used 'from gensim.models.wrappers import LdaMallet' which needs 'from gensim.models.word2vec import Vocab'. And I couldnt handle this error: 'ImportError: cannot import name 'Vocab' from 'gensim.models.word2vec' . Can you please run this notebook in a virtual env and update the requirement.txt.