takluyver / entrypoints

Discover and load entry points from installed packages
https://entrypoints.readthedocs.io/
MIT License
74 stars 29 forks source link

iter_files_distros will fail if sys.path contains a PosixPath (AttributeError: 'PosixPath' object has no attribute 'rstrip') #43

Open afparsons opened 3 years ago

afparsons commented 3 years ago

iter_files_distros(path=None, repeated_distro='first') will fail if one of the elements in sys.path is not a string. For example, my sys.path contained a PosixPath:

['/usr/local/bin', '/usr/local/lib/python38.zip', '/usr/local/lib/python3.8', '/usr/local/lib/python3.8/lib-dynload', '/usr/local/lib/python3.8/site-packages', PosixPath('/model_infrastructure')]

PosixPath objects do not implement an rstrip() method, and so the following happens [permalink]:

if path is None:
    path = sys.path

# `path` is now a list containing some strings and in my accidental case, a PosixPath

for folder in path:
    if folder.rstrip('/\\').endswith('.egg'):

# oops

I'll defer to someone with far more Python experience than I have to decide whether this is just a silly user error (i.e. entirely my fault), or something that should be handled within iter_files_distros(...).

Maybe it seems absurd that anything other than a string would find its way into sys.path, but for newcomers to Python (i.e. me, since ~3.5) who have grown up with pathlib, it doesn't feel too out of place. In fact, here's a PR.


For search engines and passers-by:

I had encountered this stacktrace in my Django + Dagster + MLflow + Gensim + Docker project:

AttributeError: 'PosixPath' object has no attribute 'rstrip'
  File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/plan/utils.py", line 42, in solid_execution_error_boundary
    yield
  File "/usr/local/lib/python3.8/site-packages/dagster/utils/__init__.py", line 382, in iterate_with_context
    next_output = next(iterator)
  File "/usr/local/lib/python3.8/site-packages/dagster/core/execution/plan/compute_generator.py", line 65, in _coerce_solid_compute_fn_to_iterator
    result = fn(context, **kwargs) if context_arg_provided else fn(**kwargs)
  File "/model_infrastructure/solids/gensim.py", line 160, in cyclic_word2vec
    experiment = mlflow.get_experiment_by_name(experiment_name)
  File "/usr/local/lib/python3.8/site-packages/mlflow/tracking/fluent.py", line 815, in get_experiment_by_name
    return MlflowClient().get_experiment_by_name(name)
  File "/usr/local/lib/python3.8/site-packages/mlflow/tracking/client.py", line 434, in get_experiment_by_name
    return self._tracking_client.get_experiment_by_name(name)
  File "/usr/local/lib/python3.8/site-packages/mlflow/tracking/_tracking_service/client.py", line 128, in get_experiment_by_name
    return self.store.get_experiment_by_name(name)
  File "/usr/local/lib/python3.8/site-packages/mlflow/store/tracking/rest_store.py", line 263, in get_experiment_by_name
    response_proto = self._call_endpoint(GetExperimentByName, req_body)
  File "/usr/local/lib/python3.8/site-packages/mlflow/store/tracking/rest_store.py", line 55, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/usr/local/lib/python3.8/site-packages/mlflow/utils/rest_utils.py", line 163, in call_endpoint
    response = http_request(
  File "/usr/local/lib/python3.8/site-packages/mlflow/utils/rest_utils.py", line 47, in http_request
    from mlflow.tracking.request_header.registry import resolve_request_headers
  File "/usr/local/lib/python3.8/site-packages/mlflow/tracking/request_header/registry.py", line 39, in <module>
    _request_header_provider_registry.register_entrypoints()
  File "/usr/local/lib/python3.8/site-packages/mlflow/tracking/request_header/registry.py", line 21, in register_entrypoints
    for entrypoint in entrypoints.get_group_all("mlflow.request_header_provider"):
  File "/usr/local/lib/python3.8/site-packages/entrypoints.py", line 236, in get_group_all
    for config, distro in iter_files_distros(path=path):
  File "/usr/local/lib/python3.8/site-packages/entrypoints.py", line 128, in iter_files_distros
    if folder.rstrip('/\\').endswith('.egg'):

It took me a while to find the cause.

My BASE_DIR in Django's settings.py was a pathlib Path object:

from pathlib import Path

# Build paths inside the project like this: BASE_DIR / 'subdir'.
BASE_DIR = Path(__file__).resolve().parent.parent

For whatever reason, I had to append BASE_DIR to sys.path from within a Dagster solid:

sys.path.append(BASE_DIR)

This resulted in a PosixPath object being added to sys.path. As explained above, sys.path is expected presently to hold only strings, and so iter_files_distros(path=None) will just call rstrip(...) on each element without type consideration.

A solution is ensure that a string is appended to sys.path:

sys.path.append(str(BASE_DIR.absolute()))
Scaramir commented 1 year ago

Would be awesome, if this could be included in a pypi release, 'cause it would be the same for WindowsPath.

takluyver commented 1 year ago

For now, the docs on the sys module say that sys.path should only contain strings:

A program is free to modify this list for its own purposes. Only strings should be added to sys.path; all other data types are ignored during import.

So I think the right thing to do in entrypoints is to just ignore pathlib objects, rather than converting them to [edit: string] paths.

I should also make it clear that entry points are now supported in the Python standard library, by the importlib.metadata module, so this package is no longer all that important. But I'm happy to merge a fix if people are still using it.