package-url / purl-spec

A minimal specification for purl aka. a package "mostly universal" URL, join the discussion at https://gitter.im/package-url/Lobby
https://github.com/package-url/purl-spec
Other
678 stars 157 forks source link

Proposal: Solution for Purl Type Definitions #310

Open stevespringett opened 3 months ago

stevespringett commented 3 months ago

Proposal: Solution for Purl Type Definitions

Defining purl types in text within a single document has led to significant inconsistencies across the board. This approach introduces ambiguity, as different interpretations of the textual descriptions can arise, leading to varied implementations and integrations. The lack of a standardized, machine-readable format means that errors and discrepancies are more likely when different parties try to use purl types. Additionally, as the number of purl types grows, maintaining a single document becomes increasingly unwieldy, making it harder to ensure consistency and accuracy. These issues highlight the need for a more structured and formalized method of defining purl types, such as using a schema or other machine-readable format, to promote uniformity and reliability across implementations.

To address the inconsistencies arising from defining purl types in text within a single document, we propose a robust solution that transitions to a machine-readable format using JSON Schema. This approach offers several benefits, including improved consistency, maintainability, and scalability.

Solution Components

1. JSON Schema for Purl Types

By creating a JSON Schema and defining purl types in that schema, we establish a standardized, machine-readable format that eliminates ambiguity and ensures consistency across implementations. JSON Schema defines the structure and constraints for each purl type, providing clear and precise guidelines that can be uniformly interpreted by all implementations.

2. Dynamic Purl Type Support

Purl implementations could theoretically read in these JSON-based purl type definitions dynamically. This means that any current or future purl type can be supported without the need to hard code rules for each type in the code. This dynamic approach reduces maintenance overhead and allows for seamless integration of new purl types as they are defined.

3. Separation of Purl Type Definitions

Instead of maintaining all purl types in a single document, we propose separating out the purl type definitions into individual JSON files. This change eliminates the need for the current PURL-TYPES.rst file, thus removing a significant source of inconsistency and making the management of purl types more modular and scalable.

4. Directory Structure for Purl Types

Each purl type will have its own directory, containing:

5. Aggregation into a Single Distributable

While maintaining modularity with individual purl type definitions, there is a need to aggregate these definitions into a single distributable package. This approach ensures that all purl type definitions can be obtained from one source, simplifying the process for implementers. By leveraging the modularity of individual definitions, this distributable can be easily updated to include new or revised purl types without requiring prior knowledge of specific types. This ensures that users and systems always have access to the most current and comprehensive set of purl type definitions, facilitating seamless integration and support across various applications.

6. Streamlining Ecma TC54-TG2 Review Process

This approach will also significantly benefit Ecma TC54-TG2, the task group responsible for standardizing purl and purl types. By adopting JSON Schema and modular definitions, TC54-TG2 can streamline its review process, making it easier to evaluate and approve new purl types.

Conclusion

This structured approach ensures clarity, consistency, and ease of maintenance, facilitating the reliable implementation and extension of purl types. By transitioning to JSON Schema, supporting dynamic type inclusion, separating definitions, organizing them into directories and providing an aggregated distributable, we can significantly enhance the robustness and scalability of purl type management.

matt-phylum commented 3 months ago

Related: #38

Is this talking about a JSON schema (a schema that is JSON) or JSON Schema (a specific meta schema for specifying JSON objects)?

How will the schema express normalization rules, especially cases like pypi which replaces all runs of _.- characters with single - characters¹?

I wonder if there are two problems here. Having a machine readable schema seems like it could be used to validate that a PURL for a package type is in the expected canonical form for implementations that have access to things like compatible regular expression engines. However, I think most of the time users only care about parsing and unparsing (does not use type-specific knowledge) or canonicalizing (requires limited type-specific knowledge). It's probably sufficient to have a limited number of full validator implementations.

¹ #165 #262

stevespringett commented 3 months ago

Is this talking about a JSON schema (a schema that is JSON) or JSON Schema (a specific meta schema for specifying JSON objects)?

I want to develop a JSON Schema that defines how to define a purl.

FYI, I'm currently doing developing a PoC locally and will check in a branch that has the start of the schema along with a few examples. I'll be sure to include Python in the examples. But I am accounting for normalization so we should be able to describe different types of rules for substitution, encoding, etc. I'd like to have something thats super lightweight and easy to work with. Nothing too complex.