pgcorpus / gutenberg

Pipeline to generate the Standardized Project Gutenberg Corpus
https://zenodo.org/record/2422561
GNU General Public License v3.0
158 stars 38 forks source link

pandas #36

Closed iandoug closed 4 years ago

iandoug commented 4 years ago

Hi

Should pandas not be "required"?

~/data2/gutenberg $ python get_data.py Traceback (most recent call last): File "get_data.py", line 9, in from src.metadataparser import make_df_metadata File "/home/ian/data2/gutenberg/src/metadataparser.py", line 13, in import pandas as pd ModuleNotFoundError: No module named 'pandas'

Installing pandas pulls in a few more packages:

Calculating dependencies... done! [ebuild N ] virtual/cblas-3.8 [ebuild N ] dev-python/numexpr-2.7.1 [ebuild N ] dev-python/pybind11-2.5.0 [ebuild N ] sci-libs/scipy-1.4.1 [ebuild N ] dev-python/bottleneck-1.3.2 [ebuild N ] dev-python/pandas-1.0.5

But still no joy after all that...

21:05:07 ~/data2/gutenberg $ python3 get_data.py Traceback (most recent call last): File "get_data.py", line 9, in from src.metadataparser import make_df_metadata File "/home/ian/data2/gutenberg/src/metadataparser.py", line 13, in import pandas as pd ModuleNotFoundError: No module named 'pandas'

~/data2/gutenberg $ pip install --user pandas Requirement already satisfied: pandas in /home/ian/.local/lib/python3.7/site-packages (1.1.1) Requirement already satisfied: python-dateutil>=2.7.3 in /usr/lib64/python3.7/site-packages (from pandas) (2.8.1) Requirement already satisfied: numpy>=1.15.4 in /usr/lib64/python3.7/site-packages (from pandas) (1.19.0) Requirement already satisfied: pytz>=2017.2 in /usr/lib64/python3.7/site-packages (from pandas) (2020.1) Requirement already satisfied: six>=1.5 in /usr/lib64/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)

Any ideas? I don't know much about Python.

Thanks, Ian

fontclos commented 4 years ago

Closing this one as pandas is already in the requirements.txt file.

As per your installation problems, I would say you have more than one python and that can be problematic. I would suggest you try to import pandas somewhere else first. Could also be python/python3 issue, have a look at the output of "which python" and "which python3". Your pandas is in /home/ian/.local/lib/python3.7/site-packages.

iandoug commented 4 years ago

Thanks.

which python /usr/bin/python 15:43:34 ~ $ which python3 /usr/bin/python3 15:43:37 ~ $ python3 --version Python 3.6.11 15:43:53 ~/1web/keyboard-design/scripts $ python --version Python 3.6.11

I tried changing the name of that folder to 3.6 but still didn't work. Will fiddle around until it does. Google shows lots of people have similar issues with pandas.

Cheers, Ian

iandoug commented 4 years ago

Mmmm. Am on Gentoo. Python was set to 3.6. Switched it to 3.7 and now the data fetch is running.

Thanks, Ian