meramos / analyze_telegramgate

NLP analysis of Telegram messages exchanged between Puerto Rico governor Ricky Rosello and his government coworkers. (Telegram Gate)
MIT License
11 stars 1 forks source link

Install instructions for MacOS #1

Open pedrobmorales opened 5 years ago

pedrobmorales commented 5 years ago

Hello Maria, thank you for this very impressive work. I tried to run it in my Mac and I had a few install steps to overcome, which I documented here:

I tried to submit a pull request but I got denied.

Here are my additions to the README.md.

MacOS Installation Steps

To run this on MacOS on a fresh Python3 installation use the commands below. These commands use Homebrew to install Jupyter, and wkhtmltopdf and use pip to install required Python3 libraries. Note that this uses pip3 since pip may refer to a Python 2.x interpreter.

brew install jupyter
pip3 install tika
pip3 install matplotlib
pip3 install classifier
pip3 install spellchecker
pip3 install pdfkit
brew cask install wkhtmltopdf

After this, download the chat PDF file from the link above and save it in a file called dataset/telegram_gate.pdf.

pedrobmorales commented 5 years ago

I think this is an INVALUABLE tool for analysts and others to reconstruct the full evidence of what is written in these chats, and others. Great job!

pedrobmorales commented 5 years ago

I am trying to install all the dependencies but I cannot load the spellchecker module because it says it cannot find indexer.

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-5-00ec13e61853> in <module>
      2 import re
      3 import classifier as spanish_sentiment_analysis
----> 4 from spellchecker import SpellChecker

/usr/local/lib/python3.7/site-packages/spellchecker/__init__.py in <module>
      1 # -*- coding: utf-8 -*-
----> 2 from  spellchecker.core import Spellchecker,getInstance
      3 

/usr/local/lib/python3.7/site-packages/spellchecker/core.py in <module>
     24 import urllib
     25 
---> 26 from indexer import DictionaryIndex
     27 from langdetect import _detect_lang
     28 

ModuleNotFoundError: No module named 'indexer'

I tried pip3 install indexer but it gave me an error.

Collecting indexer
  Downloading https://files.pythonhosted.org/packages/c7/2f/49ea001ccc81502fe790c6077ca0cf9c4dc98ce160e1b1225a8c881b53b1/indexer-0.6.2.tar.gz
    ERROR: Complete output from command python setup.py egg_info:
    ERROR: Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/zy/cj986nys61l566ps0nw0kdym0000gp/T/pip-install-njxjtjnm/indexer/setup.py", line 107
        except OSError, ex:
                      ^
    SyntaxError: invalid syntax
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/zy/cj986nys61l566ps0nw0kdym0000gp/T/pip-install-njxjtjnm/indexer/
pedrobmorales commented 5 years ago

Never mind I found it! It's pyspellchecker and not spellchecker! https://github.com/barrust/pyspellchecker/issues/24

pedrobmorales commented 5 years ago

Hmmm then I get some errors when running thru the parts that call spell.known and spell.unknown.

pedrobmorales commented 5 years ago
AttributeError                            Traceback (most recent call last)
<ipython-input-46-ffb2a717e66a> in <module>
----> 1 spell.known(['pr6ximo'])

AttributeError: 'dict' object has no attribute 'known'
meramos commented 5 years ago

The spellchecker section in the jupyter notebook isn't complete, I was testing an alternative to fixing the words with accents which were misread by Tika. I will update the notebook to be cleaner and not include unused code.

Thank you for your feedback! I will incorporate your suggestion to the README.