nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
636 stars 116 forks source link

Address packaging warnings related to data files included inside the `flintrock` package directory #368

Open nchammas opened 7 months ago

nchammas commented 7 months ago

Installing Flintrock with pip install -vvv -e . shows a few warnings like this:

  /private/var/folders/_0/vqvhqvnj5wq9s4s5drcb30pc0000gn/T/pip-build-env-lm653aie/overlay/lib/python3.8/site-packages/setuptools/command/build_py.py:207: _Warning: Package 'flintrock.scripts' is absent from the `packages` configuration.
  !!

          ********************************************************************************
          ############################
          # Package would be ignored #
          ############################
          Python recognizes 'flintrock.scripts' as an importable package[^1],
          but it is absent from setuptools' `packages` configuration.

          This leads to an ambiguous overall configuration. If you want to distribute this
          package, please make sure that 'flintrock.scripts' is explicitly added
          to the `packages` configuration field.

          Alternatively, you can also rely on setuptools' discovery methods
          (for example by using `find_namespace_packages(...)`/`find_namespace:`
          instead of `find_packages(...)`/`find:`).

          You can read more about "package discovery" on setuptools documentation page:

          - https://setuptools.pypa.io/en/latest/userguide/package_discovery.html

          If you don't want 'flintrock.scripts' to be distributed and are
          already explicitly excluding 'flintrock.scripts' via
          `find_namespace_packages(...)/find_namespace` or `find_packages(...)/find`,
          you can try to use `exclude_package_data`, or `include-package-data=False` in
          combination with a more fine grained `package-data` configuration.

          You can read more about "package data files" on setuptools documentation page:

          - https://setuptools.pypa.io/en/latest/userguide/datafiles.html

          [^1]: For Python, any directory (with suitable naming) can be imported,
                even if it does not contain any `.py` files.
                On the other hand, currently there is no concept of package data
                directory, all directories are treated like packages.
          ********************************************************************************

  !!

These warnings are triggered for all the data folders we package for use during cluster launch and configuration:

This is because an importable Python package can just be a subdirectory, so a subdirectory of data doubles ambiguously as a Python package. This is discussed extensively here: https://github.com/pypa/setuptools/issues/3340

And just for the record, we also package the following non-directories as data (which don't trigger any warnings):

It's not a big deal, and I don't think it will impact users for now. Setuptools may in the future be more forceful about discouraging this kind of mixing of Python packages and data.

I don't know how to fix this, so I'm just noting it here for the future. Perhaps we need to reorganize the code from a flat-layout to a src-layout and have the data files live outside the flintrock/ package directory. I'm not sure.

Currently using: