python / importlib_metadata

Library to access metadata for Python packages
https://importlib-metadata.readthedocs.io
Apache License 2.0
123 stars 80 forks source link

Add direct_url.json content to distribution #429

Closed kasium closed 1 year ago

kasium commented 1 year ago

Closes #404

kasium commented 1 year ago

@jaraco Here is a small PoC how the parsing could work. Please let me know what you think since there are multiple options, e.g.

In addition, what happens with importlib.metadata. Does it get automatically updated?

jaraco commented 1 year ago

@jaraco Here is a small PoC how the parsing could work. Please let me know what you think since there are multiple options, e.g.

* using dataclasses

* don't using inner classes

* using TypedDict, etc.

Thanks for the contrib.

I see now that you're going for a lot more than to provide access to the direct_url.json but to provide some structure and context around its contents.

I'm slightly reluctant for importlib_metadata to own this behavior, as it's not obvious to me how stable the data structure is. As I review the bug again, I'm reminded that this file is defined in PEP 660, so is likely fairly stable.

But if I compare the direct_url.json parsing to other parts of importlib_metadata, where things like the Version or requirements only provide primitive values and rely on third-party packages (i.e. packaging) to interpret them, I wonder if something similar should be done for importlib_metadata. That is, maybe packaging or another third-party package should own the objects for PEP 660 usage.

What I recommend - let's get feedback from the packaging community (forum or discord) as to where this behavior should reside. If it should be here (importlib metadata), then we'll move forward with getting this proposal merged.

In addition, what happens with importlib.metadata. Does it get automatically updated?

It does. New features like this will land in the next Python release, which is currently 3.12, but the cutoff for new features is quickly approaching, so it likely won't land until 3.13 (another advantage of supplying the behavior in a third party package is speed of availability for older Pythons).

kasium commented 1 year ago

@jaraco seems like I forgot to reply. Here is the post: https://discuss.python.org/t/place-for-a-direct-url-json-parser/26266

jaraco commented 1 year ago

I was thinking instead of hard-coding the whole type hierarchy to simply reflect what's in JSON with something like this:

diff --git a/importlib_metadata/__init__.py b/importlib_metadata/__init__.py
index 857c9198..4a33b2dc 100644
--- a/importlib_metadata/__init__.py
+++ b/importlib_metadata/__init__.py
@@ -3,8 +3,10 @@ import re
 import abc
 import csv
 import sys
+import json
 import zipp
 import email
+import types
 import inspect
 import pathlib
 import operator
@@ -618,6 +620,16 @@ class Distribution(DeprecatedNonAbstract):
             space = url_req_space(section.value)
             yield section.value + space + quoted_marker(section.name)

+    @property
+    def origin(self):
+        return self._load_json('direct_url.json')
+
+    def _load_json(self, filename):
+        return pass_none(json.loads)(
+            self.read_text(filename),
+            object_hook=lambda data: types.SimpleNamespace(**data),
+        )
+

 class DistributionFinder(MetaPathFinder):
     """

This approach would generalize to any json field with much less code, but provide a similar interface (dist.origin.archive_info.hash resolves simply). It doesn't, however, provide classes for each type nor does it fill in missing values with None or defaults (editable as False).

How do you see these tradeoffs?

jaraco commented 1 year ago

Separately, would you be willing to write some tests to capture the expected use-cases? In particular, I'd like some (all?) of the fixtures to expose direct_url.json and then exercise that in the tests. These tests should run in isolation and not rely on pip or other packages to generate direct_url.json. It may be possible to re-build the sdist/wheels in tests/data from prepare/, but I'm not sure how that's done or what it would take for the built artifacts to include direct_url.json. Probably better would be to expose literal direct_data.json in the tests/fixtures.

kasium commented 1 year ago

@jaraco personally I'm a big typing fan. Maybe we could use a TypedDict here?

jaraco commented 1 year ago

@jaraco personally I'm a big typing fan. Maybe we could use a TypedDict here?

TypedDict looks encouraging, especially if it provides a declarative way to describe the hierarchy of types in a direct_url.json doc without having to imperatively construct each level of the hierarchy. That is, if you could create a Origin(TypedDict) containing all of its members and child types and then somehow construct that tree of objects using json.loads(...), I'd be very much in favor. What I'd particularly like to avoid is having to explicitly construct each attribute of each instance. I'd also like to avoid traversing the tree twice (once to construct the dicts from json and again to construct the typed dicts from the anonymous dicts); it's preferable to construct the objects in one pass.

kasium commented 1 year ago

Based on the discussion here, the feature was moved to https://github.com/pypa/packaging/issues/701