Open pedrobmorales opened 5 years ago
I think this is an INVALUABLE tool for analysts and others to reconstruct the full evidence of what is written in these chats, and others. Great job!
I am trying to install all the dependencies but I cannot load the spellchecker module because it says it cannot find indexer.
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-5-00ec13e61853> in <module>
2 import re
3 import classifier as spanish_sentiment_analysis
----> 4 from spellchecker import SpellChecker
/usr/local/lib/python3.7/site-packages/spellchecker/__init__.py in <module>
1 # -*- coding: utf-8 -*-
----> 2 from spellchecker.core import Spellchecker,getInstance
3
/usr/local/lib/python3.7/site-packages/spellchecker/core.py in <module>
24 import urllib
25
---> 26 from indexer import DictionaryIndex
27 from langdetect import _detect_lang
28
ModuleNotFoundError: No module named 'indexer'
I tried pip3 install indexer but it gave me an error.
Collecting indexer
Downloading https://files.pythonhosted.org/packages/c7/2f/49ea001ccc81502fe790c6077ca0cf9c4dc98ce160e1b1225a8c881b53b1/indexer-0.6.2.tar.gz
ERROR: Complete output from command python setup.py egg_info:
ERROR: Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/zy/cj986nys61l566ps0nw0kdym0000gp/T/pip-install-njxjtjnm/indexer/setup.py", line 107
except OSError, ex:
^
SyntaxError: invalid syntax
----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/zy/cj986nys61l566ps0nw0kdym0000gp/T/pip-install-njxjtjnm/indexer/
Never mind I found it! It's pyspellchecker and not spellchecker! https://github.com/barrust/pyspellchecker/issues/24
Hmmm then I get some errors when running thru the parts that call spell.known and spell.unknown.
AttributeError Traceback (most recent call last)
<ipython-input-46-ffb2a717e66a> in <module>
----> 1 spell.known(['pr6ximo'])
AttributeError: 'dict' object has no attribute 'known'
The spellchecker section in the jupyter notebook isn't complete, I was testing an alternative to fixing the words with accents which were misread by Tika. I will update the notebook to be cleaner and not include unused code.
Thank you for your feedback! I will incorporate your suggestion to the README.
Hello Maria, thank you for this very impressive work. I tried to run it in my Mac and I had a few install steps to overcome, which I documented here:
I tried to submit a pull request but I got denied.
Here are my additions to the README.md.
MacOS Installation Steps
To run this on MacOS on a fresh Python3 installation use the commands below. These commands use Homebrew to install Jupyter, and wkhtmltopdf and use pip to install required Python3 libraries. Note that this uses pip3 since pip may refer to a Python 2.x interpreter.
After this, download the chat PDF file from the link above and save it in a file called dataset/telegram_gate.pdf.