pypa / packaging-problems

An issue tracker for the problems in packaging
147 stars 34 forks source link

Using graft/prune, or setting exclude in find_packages() seems to have no effect #269

Open zaneselvans opened 5 years ago

zaneselvans commented 5 years ago
  1. What is your operating system and version? Ubuntu 19.04

  2. What is your Python version? 3.7.3

  3. What version of pip do you have? 19.1.1

  4. Could you describe your issue in as much detail as possible? I am attempting to create a python package for the first time, on a project that needs to include a fair amount of "data" -- i.e. non-code files -- and I feel like I am losing my mind trying to get those files included in the package.

When I try to explicitly include files from outside of the python packages using graft or include in the MANIFEST.in, those directives appear to be ignored. Conversely, when I use prune to avoid including my test directory, which is also a python package, that directive is also ignored.

Currently, my setup.py looks like:

from setuptools import setup, find_packages

setup(
    name='catalyst-cooperative.pudl',
    description='Tools for liberating public US electric utility data.',
    version='0.1.0a1',
    author='Catalyst Cooperative',
    author_email='pudl@catalyst.coop',
    maintainer='Zane A. Selvans',
    maintainer_email='zane.selvans@catalyst.coop',
    url='https://github.com/catalyst-cooperative/pudl',
    project_urls={
        "Background": "https://catalyst.coop/pudl",
        "Documentation": "https://catalyst-cooperative-pudl.readthedocs.io",
        "Source": "https://github.com/catalyst-cooperative/pudl",
        "Issue Tracker": "https://github.com/catalyst-cooperative/pudl/issues",
        "Gitter Chat": "https://gitter.im/catalyst-cooperative/pudl",
        "Slack": "https://catalystcooperative.slack.com",
    },
    license='MIT',
    keywords=[
        'electricity', 'energy', 'data', 'analysis', 'mcoe', 'climate change',
        'finance', 'eia 923', 'eia 860', 'ferc', 'form 1', 'epa ampd',
        'epa cems', 'coal', 'natural gas', ],
    python_requires='>=3.6, <4',
    install_requires=[
        'datapackage',
        'dbfread',
        'fastparquet',
        'goodtables',
        'networkx',
        'numpy',
        'pandas>=0.21',
        'pyarrow',
        'pyyaml',
        'scikit-learn>=0.20',
        'scipy',
        'sqlalchemy>=1.3',
        'tableschema',
        'timezonefinder',
    ],
    classifiers=[
        'Development Status :: 3 - Alpha',
        'Environment :: Console',
        'Intended Audience :: Science/Research',
        'Intended Audience :: Developers',
        'License :: OSI Approved :: MIT License',
        'Natural Language :: English',
        'Operating System :: OS Independent',
        'Programming Language :: Python :: 3.6',
        'Programming Language :: Python :: 3.7',
        'Topic :: Scientific/Engineering',
    ],
    packages=find_packages(),
    # package_data is data that is deployed within the python package on the
    # user's system. setuptools will get whatever is listed in MANIFEST.in
    include_package_data=True,
    # The "right way" to deploy scripts so that they work on Windows as well is
    # with entry_points and console_scripts, but that will require some
    # additional re-organization. See issue #327:
    # https://github.com/catalyst-cooperative/pudl/issues/327
    scripts=[
        'scripts/update_datastore.py',
        'scripts/ferc1_to_sqlite.py',
        'scripts/init_pudl.py',
        'scripts/epacems_to_parquet.py',
    ],
)

And my MANIFEST.in looks like:

include results/id_mapping/mapping_eia923_ferc1.xlsx
include scripts/*default.yml
prune test
global-exclude .gitignore __pycache__ *.py[cod] *checkpoint.ipynb

When I run python setup.py build I get the following:

running build
running build_py
creating build
creating build/lib
creating build/lib/test
copying test/datastore_test.py -> build/lib/test
copying test/travis_ci_test.py -> build/lib/test
copying test/__init__.py -> build/lib/test
copying test/datazipper_test.py -> build/lib/test
copying test/etl_test.py -> build/lib/test
copying test/conftest.py -> build/lib/test
creating build/lib/pudl
copying pudl/helpers.py -> build/lib/pudl
copying pudl/datastore.py -> build/lib/pudl
copying pudl/settings.py -> build/lib/pudl
copying pudl/constants.py -> build/lib/pudl
copying pudl/load.py -> build/lib/pudl
copying pudl/init.py -> build/lib/pudl
copying pudl/__init__.py -> build/lib/pudl
creating build/lib/test/validation
copying test/validation/ferc1_test.py -> build/lib/test/validation
copying test/validation/__init__.py -> build/lib/test/validation
copying test/validation/eia860_test.py -> build/lib/test/validation
copying test/validation/eia923_test.py -> build/lib/test/validation
copying test/validation/mcoe_test.py -> build/lib/test/validation
creating build/lib/pudl/models
copying pudl/models/eia860.py -> build/lib/pudl/models
copying pudl/models/ferc1.py -> build/lib/pudl/models
copying pudl/models/eia923.py -> build/lib/pudl/models
copying pudl/models/__init__.py -> build/lib/pudl/models
copying pudl/models/epacems.py -> build/lib/pudl/models
copying pudl/models/entities.py -> build/lib/pudl/models
copying pudl/models/glue.py -> build/lib/pudl/models
creating build/lib/pudl/extract
copying pudl/extract/eia860.py -> build/lib/pudl/extract
copying pudl/extract/ferc1.py -> build/lib/pudl/extract
copying pudl/extract/eia923.py -> build/lib/pudl/extract
copying pudl/extract/__init__.py -> build/lib/pudl/extract
copying pudl/extract/epacems.py -> build/lib/pudl/extract
creating build/lib/pudl/transform
copying pudl/transform/eia860.py -> build/lib/pudl/transform
copying pudl/transform/ferc1.py -> build/lib/pudl/transform
copying pudl/transform/eia923.py -> build/lib/pudl/transform
copying pudl/transform/eia.py -> build/lib/pudl/transform
copying pudl/transform/__init__.py -> build/lib/pudl/transform
copying pudl/transform/epacems.py -> build/lib/pudl/transform
creating build/lib/pudl/analysis
copying pudl/analysis/analysis.py -> build/lib/pudl/analysis
copying pudl/analysis/mcoe.py -> build/lib/pudl/analysis
copying pudl/analysis/__init__.py -> build/lib/pudl/analysis
creating build/lib/pudl/glue
copying pudl/glue/zipper.py -> build/lib/pudl/glue
copying pudl/glue/__init__.py -> build/lib/pudl/glue
creating build/lib/pudl/output
copying pudl/output/eia860.py -> build/lib/pudl/output
copying pudl/output/ferc1.py -> build/lib/pudl/output
copying pudl/output/eia923.py -> build/lib/pudl/output
copying pudl/output/export.py -> build/lib/pudl/output
copying pudl/output/__init__.py -> build/lib/pudl/output
copying pudl/output/pudltabl.py -> build/lib/pudl/output
copying pudl/output/glue.py -> build/lib/pudl/output
running egg_info
creating catalyst_cooperative.pudl.egg-info
writing catalyst_cooperative.pudl.egg-info/PKG-INFO
writing dependency_links to catalyst_cooperative.pudl.egg-info/dependency_links.txt
writing requirements to catalyst_cooperative.pudl.egg-info/requires.txt
writing top-level names to catalyst_cooperative.pudl.egg-info/top_level.txt
writing manifest file 'catalyst_cooperative.pudl.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '__pycache__' found anywhere in distribution
warning: no previously-included files matching '*.py[cod]' found anywhere in distribution
warning: no previously-included files matching '*checkpoint.ipynb' found anywhere in distribution
writing manifest file 'catalyst_cooperative.pudl.egg-info/SOURCES.txt'
creating build/lib/pudl/metadata
copying pudl/metadata/plant_info_for_additional_cems_plants.csv -> build/lib/pudl/metadata
creating build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/boiler_fuel_map_eia923.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/boiler_generator_assn_map_eia860.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/fuel_receipts_costs_map_eia923.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/generation_fuel_map_eia923.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/generator_assn_map_eia860.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/generator_map_eia923.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/generator_proposed_assn_map_eia860.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/generator_retired_assn_map_eia860.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/ownership_assn_map_eia860.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/plant_assn_map_eia860.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/plant_frame_map_eia923.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/skiprows_eia860.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/skiprows_eia923.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/stocks_map_eia923.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/tab_map_eia860.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/tab_map_eia923.csv -> build/lib/pudl/metadata/xlsx_maps
copying pudl/metadata/xlsx_maps/utility_assn_map_eia860.csv -> build/lib/pudl/metadata/xlsx_maps
running build_scripts
creating build/scripts-3.7
copying and adjusting scripts/update_datastore.py -> build/scripts-3.7
copying and adjusting scripts/ferc1_to_sqlite.py -> build/scripts-3.7
copying and adjusting scripts/init_pudl.py -> build/scripts-3.7
copying and adjusting scripts/epacems_to_parquet.py -> build/scripts-3.7
changing mode of build/scripts-3.7/update_datastore.py from 644 to 755
changing mode of build/scripts-3.7/ferc1_to_sqlite.py from 644 to 755
changing mode of build/scripts-3.7/init_pudl.py from 644 to 755
changing mode of build/scripts-3.7/epacems_to_parquet.py from 644 to 755

The test directory is included, even though I tried to prune it, and neither the default settings files for the scripts nor the id_mapping spreadsheet were included. However the global-exclude directive was respected.

If instead I pass the specific directory which contains the python package I do want included by saying find_packages('pudl'), and also set package_dir={'':'pudl'} it then fails to respect the include_package_data=True and does not bring in the contents of the metadata directory contained within the python package pudl.

If instead I give find_packages(exclude=['test']) it still includes the test/validation sub-directory, and adding 'test/validation' to the list of excluded directories has no effect.

Adding recursive-exclude test to the end of MANIFEST.in when using packages=find_packages() has no effect, and everything within the test directory is included.

I have been reading the Python Packaging Tutorial and also this and other posts by @ionelmc. And of course the setuptools documentation

ionelmc commented 5 years ago

Here are my notes:

PS. Jesus crist, start using the src-layout already!

zaneselvans commented 5 years ago

Okay, thank you for the push and all of the clearly written references @ionelmc. I went ahead and bit the bullet and completely re-organized the repository to:

It generally seems to be working locally. There's some more work to disentangle our current pattern of use from the repository before I can see how it runs on Travis, which will include converting our existing scripts into modules and using console_scripts with entry_points.

Two things that still seem odd or broken:

ionelmc commented 5 years ago

If you use setuptools_scm then you remove the manifest.in - it's no longer required. Nor you can control what you get in the sdist anymore (sdist becomes 1:1 with git repo).

You should have 2 sets of requirements: abstract (no pins, setup.py) and concrete (exact version pins, tox.ini, requirements.txt etc). There are tools to manage the pins (https://github.com/dephell/dephell, https://github.com/jazzband/pip-tools or pip freeze)

zaneselvans commented 5 years ago

Hmm, in the end it seemed like it was possible to use prune in the MANIFEST.in to remove the directories that are irrelevant to the distribution I'm trying to package. But maybe I am looking at the wrong thing (I wanted to include src, docs, and test, but none of the jupyter notebooks or specific analyses that were being done using the library and its outputs)

DepHell looks great! Will definitely give it a try to simplify the many environments.