kieferk / dfply

dplyr-style piping operations for pandas dataframes
GNU General Public License v3.0
889 stars 103 forks source link

data still missing after installation #86

Open jaybundy opened 5 years ago

jaybundy commented 5 years ago

I'm having this same issue still: https://github.com/kieferk/dfply/issues/8

-I am using conda to install dfply (which I need to because that's the package manager used by the computing cluster I have access to).

conda install -c tallic dfply

That's the command I use to install the package from https://anaconda.org/tallic/dfply.

But when I go to use dfply, it still says the diamonds.csv data is missing.

Traceback (most recent call last): File "ACH_nested_anova.py", line 1, in import dfply File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/init.py", line 11, in from .data import diamonds File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/init.py", line 5, in diamonds = pd.read_csv(os.path.join(root, "diamonds.csv")) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f return _read(filepath_or_buffer, kwds) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in init self._make_engine(self.engine) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] File b'/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/diamonds.csv' does not exist: b'/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/diamonds.csv'

2019-03-15 13:25:11 ⌚ gateway-03 in ~/ACH_Development/ACH_tests/ACH_quiz3/python_scripts/Analysis ○ → python ACH_nested_anova.py Traceback (most recent call last): File "ACH_nested_anova.py", line 2, in from dfply import group_by as group_by, summarize as summarize, select as select File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/init.py", line 11, in from .data import diamonds File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/init.py", line 5, in diamonds = pd.read_csv(os.path.join(root, "diamonds.csv")) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f return _read(filepath_or_buffer, kwds) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in init self._make_engine(self.engine) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] File b'/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/diamonds.csv' does not exist: b'/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/diamonds.csv'

2019-03-15 13:25:41 ⌚ gateway-03 in ~/ACH_Development/ACH_tests/ACH_quiz3/python_scripts/Analysis ○ → pip install dfply Requirement already satisfied: dfply in /mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages (0.3.1) Requirement already satisfied: numpy in /mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages (from dfply) (1.16.2) Requirement already satisfied: pandas in /mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages (from dfply) (0.24.2) Requirement already satisfied: python-dateutil>=2.5.0 in /mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages (from pandas->dfply) (2.8.0) Requirement already satisfied: pytz>=2011k in /mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages (from pandas->dfply) (2018.9) Requirement already satisfied: six>=1.5 in /mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages (from python-dateutil>=2.5.0->pandas->dfply) (1.12.0)

2019-03-15 13:26:59 ⌚ gateway-03 in ~/ACH_Development/ACH_tests/ACH_quiz3/python_scripts/Analysis ○ → python ACH_nested_anova.py Traceback (most recent call last): File "ACH_nested_anova.py", line 2, in from dfply import group_by as group_by, summarize as summarize, select as select File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/init.py", line 11, in from .data import diamonds File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/init.py", line 5, in diamonds = pd.read_csv(os.path.join(root, "diamonds.csv")) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f return _read(filepath_or_buffer, kwds) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in init self._make_engine(self.engine) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] File b'/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/diamonds.csv' does not exist: b'/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/diamonds.csv'

I can substitute the import line with any of the following and the result is still the same: -import dfply -from dfply import group_by as group_by, summarize as summarize, select as select -from dfply import *

Please help. I cannot seem to use git or pip to correct the problem. Pip tells me the package is already installed, but I get the same problem. Git is not available to me.

andrewkho commented 4 years ago

I'm not 100% sure, but I guess this issue comes from building from source. If you pip download --no-binary :all: --no-dependencies dfply you'll find the same issue, the diamonds.csv file is missing from dfply/data/ folder. However downloading the wheel pip download --no-dependencies dfply, if you inspect the wheel you'll find that the diamonds.csv file is there.

I don't know anything about conda package management, but perhaps they take the result of python setup.py sdist, which would omit the data file. According to this random SO post, a MANIFEST file should fix things.

https://stackoverflow.com/questions/7522250/how-to-include-package-data-with-setuptools-distribute

complexgenome commented 4 years ago

I ran into the same error with file missing of diamonds when dfply library was installed using conda (conda install -c tallic dfply). In order to resolve this remove library and its dependencies using conda.
Then install using pip under same conda environment.

library installed with pip install works