mc-cip / spec

Index specification
MIT License
4 stars 1 forks source link

Index Specification #1

Open lazoolimc opened 4 years ago

lazoolimc commented 4 years ago

The goal here is to create a data specification to be used in an index for modded assets. This issue serves for discussion of the specifics.

Initially, I have some questions that I think might help shape things.

lazoolimc commented 4 years ago

For discussion purposes, here's a very rough example of what an index document might look like:

{
  "project_id": { // Project ID slug
    "url": "https://www.example.org/", // Any valid URL, optional
    "description": "Lorem ipsum dolor sit amet.", // Human-readable text.
    "type": "mod", // Type of project from a pre-defined and well-known set.
    "authors": [
      // List of author/maintainer information *for latest release*, non-historical.
      {
        "name": "mcdev", // Developer name or handle. Must be distinct per-person.
        "donate_url": "https://www.example.org/mcdev/donate" // Link to donation page
      }
      // ...
    ],
    "files": [
      // List of releases, where each element represents a release.
      {
        "version": "1.0.0", // Any string is valid
        "release_date": "1970-01-01Z00:00:00", // ISO8601 formatted datetime stamp
        "file_size": 12345, // File size in bytes
        "checksums": {
            // Map of function to hash
            "sha256": "...",
            // ...
        },
        "game_version": "1.15.4", // Minecraft game version from a pre-defined and well-known set of strings
        "download_url": "https://www.example.org/file.jar", // Direct URL to file download
        "requirements": {
          // Map where key = project ID slug, value = dependency specification
          "fabric": "0.6.1.45", // Mod loader requirement
          "other_project": "1.0.0", // Project dependency
          "third_project": ["0.3", "0.32-alpha"] // Multiple version/range requirement
          // ...
        }
      }
      // ...
    ]
  }
  // ...
}

This very roughly covers the same data that CF's API currently returns.

stairman06 commented 4 years ago

Here's my thoughts on the rough example:

Those are my thoughts. It's still pretty rough. Again, many of these are inspired from or based off my Minecraft launcher.

ghost commented 4 years ago

This first comment is meant to just address what I've seen so far and then I'll make a second comment with the prototype specification I've developed and why I made some of the choices I made.

How should versioning work? Mods rarely conform to semver, so I'm not positive that automated version comparison is possible. I'm also not sure that needs to be addressed.

I think that we should definitely have a specific version system in place. For fabric at least it is required that all mod utilize semver (at least in the internal manifests) and really the whole point of this component of the project is that we're developing a universal specification for manifests so we should be strict about defining these things. I recommend we just require semver 2.0.0 for all versioning that we use.

How should authors and contributors be modeled? If at all?

I like the way you modeled authors and contributors but I think that we shouldn't distinguish between authors and contributors but instead have one patreon or paypal link per project and then users can simply support the project and not a specific author. Otherwise if there's 4 patreon links for example it's hard for an end user to know which one to support and they might end up not supporting at all or supporting the wrong person. Also if there's just one link they developers and maintainers can manage how they cut the profits without us intervening (because really I don't think it's something we should manage).

How can we/should we support multiple artifacts for a given release?

I don't really think so if multiple artifacts are required by a project then they can just simply release different versions or make them different projects. At least with the deobfuscation thing all fabric mods come with a ref-map file in them that is used to deobfuscate the code so there's no need for a separate binary. As for apis I feel like the api should be included with the impl in most cases and if it isn't then there should be a specific project for the api. Take Thaumcraft for example, the source was never really public but there was a repo for the api so the api was really seen a separate project. But it also begs the question of why we're worrying about deobfuscated binaries and api binaries when in reality this is mostly a tool for end users (players and pack developers) and not mod developers so nobody who uses this should ever need those two things.

What sorts of data would we need to accommodate to support:

I think we briefly discussed this but it sounds like mods for both loaders, resourcepacks, and datapacks should all be included. Someone mentioned worlds might also be useful for pack developers so that's something to consider.

Do you think we should include worlds as one of the resources we support?

How do we effectively support multiple modloaders/APIs? Minimally, it should be possible to index Forge and Fabric mods.

It really shouldn't be difficult the forge and fabric configs actually look really similar so it shouldn't be hard to define a set of mandatory fields and then optional fields depending on which loader the mod is for. After all there are mods that use the same exact sources but work on both loaders so that should be a testament to the ability to develop one tool for the job.

On lazoolimc's implementation of the authors field, like I said above I think we shouldn't distinguish between authors and contributors and instead have one link per project to something like paypal or patreon.

On lazoolimc's implementation of the files field, I don't think this is necessary because the releases page of GitHub should be sufficient to get all of the information we need and having project authors define each and every release specifically like this is not efficient whatsoever. Like cargo can pull directly from git repos and it doesn't need a manifest like that, we shouldn't either. The specifics should just exist within each commit version in the repo.

The next comment I make will be with the breakdown of the prototype spec I put together with some explanations of the choices I made.

ghost commented 4 years ago

Available as a gist here

Specification

All resources that are to be indexed by the MC Index must contain a mcindex.json file, formatted using json, that follows the given specification.

Mandatory fields

project.namespace

A string representing the unique identification name that index uses in order to define a project. Also corresponds to the namespace in game of any custom content added by the project.

project.version

A semver 2.0.0 compatible version string. Should also correspond to a git tag of the corresponding release for the github repository. Displayed both in game and on the index website. Used to determine versions and dependencies for installing.

project.type

A string representing the type of the project. "type" can have one of 4 values at the moment "fabric" which indicates a mod using the fabric mod loader, "forge" which indicated a mod using the forge mod loader, "resourcepack" which indicates a resourcepack, and "datapack" which indicates a datapack. Displayed both in game and on the index website. Used to determine versions and dependencies for installing.

Optional fields

schemaVersion

Defines which version of the schema specification is being used to evaluate the schema. If a value is not given it defaults to 0.

project.name

A human readable name for the project. Displayed both in game and on the index website. If a value is not given it defaults to project.namespace but with each character preceded by a space capitalized.

project.description

A string describing the project. Displayed both in game and on the index website. If a value is not given it defaults to an empty string and is not displayed.

project.homepage

A string containing the URI for the homepage of the website for the project. Displayed both in game and on the index website. If a value is not given it defaults to an empty string and is not displayed.

project.sources

A string containing the URI for the git repository containing the sources for the project. Displayed both in game and on the index website. If a value is not given it defaults to an empty string and is not displayed.

project.wiki

A string containing the URI for the wiki website for the project. Displayed both in game and on the index website. If a value is not given it defaults to an empty string and is not displayed.

project.issues

A string containing the URI for the bug tracker and issues webpage for the project. Displayed both in game and on the index website. If a value is not given it defaults to an empty string and is not displayed.

project.donate

A string containing the URI for a donation page for the project. Displayed both in game and on the index website. If a value is not given it defaults to an empty string and is not displayed.

project.license

A string representing the type of license that the project is licensed under. Displayed both in game and on the index website. If a value is not given it defaults to "All Rights Reserved".

project.categories

A list of strings that represent categories that the project falls under. Displayed both in game and on the index website. If no values are listed, or no value is given it defaults to an empty list and no items are displayed.

Think of the categories similar to that of hashtags or the github repository topics.

project.authors

A list of strings for each of the names of the authors of the project. Displayed both in game and on the index website. If no values are listed, or no values are given it defaults to an empty list and no items are displayed.

project.environment

A string representing the environment that the project should be installed on. The key can have three values "client" indicating a project that should only ever be installed on the client side (resource packs, client side mods), "server" indicating a project that should only ever be installed on the server side (server side mods, datapacks), and "common" indicating projects that must be installed on both the server and the client side. Displayed both in game and on the index website. If no value is given it defaults to "common".

This value is mostly used for mods however it is not restricted to being defined only for mods.

project.contact

A list of contact objects that represent ways to contact the developers of the project. Displayed both in game and on the index website. If no values are listed, or no values are given it defaults to an empty list and no items are displayed.

Contact objects have two mandatory fields. The first is "name" which is a string that defines the name that should be displayed for the contact type both in game and on the website. The second is "link" which is a string that represents a URI for accessing the content method. Entries such as "mailto:*" links are fully supported.

project.relationships

A list of relationship objects that represent different relationships that the project has with other projects. Used internally by the installer to manage which projects and which versions are to be loaded and to manage conflicts. If no values are listed or no values are given it defaults to an empty list and no relationships are defined.

Relationship objects have two mandatory fields and a series of optional fields.

The first mandatory field is "namespace" which is a string that defines the namespace of the project for which to define the relationship. This value must match with the namespace defined in the mcindex.json file of the other project otherwise it will not be found.

It should be noted that there are a couple reserved namespaces that trigger specific functionality that cannot be used by mods and do not follow the traditional installation system. The following is an explanation of each of the reserved namespaces and what they are used for.

"minecraft"

Is used to define the version of minecraft that the project depends upon. Minecraft is the base game and is thus installed not as a project and before all of the other projects indicating a method of installation that has to be handled seperately.

"yarn"

Is used to define the yarn mappings that ought to be used in a development environment. Yarn mappings are not installed, but rather applied to the minecraft binary unlike the traditional install process.

"fabric-loader"

Is used to define the version of the fabric loader that the project depends upon. The fabric loader is not installed as a project but instead as a modification applied to the minecraft binary indicating a method of installation that has to be handled seperately.

"forge"

Is used to define the version of forge that the project depends upon. The forge mod loader is not installed as a project but instead as a modification applied to the minecraft binary indicating a method of installation that has to be handled seperately.

The second mandatory field is "version" which is a string that defines a comparison requirement qualified, semver 2.0.0 compatible value representing the version of the project that the relationship should be defined for. This value fully supports wildcards in semver values. Comparison requirements are qualifiers that can be placed upon the specific sem ver value defined. The valid comparison requirements are as follows

major.minor.patch: No comparison requirement. The relationship is defined for the given value and only the given value.

> major.minor.patch: Greater than comparison requirement. The relationship is defined for versions greater than the given value.

< major.minor.patch: Less than comparison requirement. The relationship is defined for versions less than the given value.

>= major.minor.patch: Greater than or equal to comparison requirement. The relationship is defined for versions greater than or equal to the given value.

<= major.minor.patch: Less than or equal to comparison requirement. The relationship is defined for versions less than or equal to the given value.

The relationship objects also have the option of defining any of the following optional fields

The first is "type" which is a string that represents the type of relationship to define between the two projects. It is displayed both in game and on the index website. Type can be one of six values: "depends-hard" which indicates a dependency that must be installed for the project to run, otherwise it will crash; "depends-soft" which indicates a dependency that should be installed for the project to run, otherwise it will throw errors; "depends-dev" which is used to define dependencies that should be installed to test in a development environment; "recommends" which indicates a dependency that is recommended to be installed but is not necessary and will not throw any errors if not installed; "conflicts-soft" which indicates a project that will conflict with the project in a way that still allows the game to run and either only minorly breaks game functionality or causes errors; and finally "conflicts-hard" which indicates a project that will conflict with the project so that the game cannot run and will instead crash. If a value is not given for this field it default to "depends-hard".

The second is "git" which is a string representing the URI of a git repository containing the source code for a project. The installer will then look for a corresponding version tag to install.

The third is "maven" which is a string representing the URI of a maven repository containing the binaries for a project. The installer will then look for the corresponding version tag to install.

If neither "git" nor "maven" are defined, either the relationship is for a reserved namespace and logic is handled seperately, or the installer searches the mod index for the source for the project. If the index cannot independently find the sources for the project then it fails.

Fabric specific fields

project.entrypoints

A list of entrypoint objects that represent mod entrypoints to be loaded by the loader. Used internally by fabric loader in order to function. If no values are listed or no values are given it defaults to an empty list and no entrypoints are loaded.

Entrypoint objects have two mandatory fields. The first is "path" which is a string that defines the path to the class that ought to be loaded as an entrypoint. The second is "type" which is a string that represents what environment the entrypoint ought to be loaded in. "type" can have three potential values "client" which indicates that the entrypoint implements ClientModInitializer and should only be loaded on the client side, "server" which indicates that the entrypoint implements DedicatedServerModInitializer and should only be loaded on a dedicated server, and "common" which indicates that the entrypoint implements ModInitializer and should be loaded generally on both the client and server side.

project.mixins

A list of mixin objects that represent mixins to be loaded by the mixin loader. Used internally by fabric loader in order to function. If no values are listed or no values are given it defaults to an empty list and no mixins are loaded.

Mixin objects have two mandatory fields. The first is "path" which is a string that defines the path to the class that ought to be loaded as a mixin. The second is "type" which is a string that represents what environment the mixin ought to be loaded in. "type" can have three potential values "client" which indicates that the mixin should only be loaded on the client side, "server" which indicates that the mixin should only be loaded on a dedicated server, and "common" which indicates that the mixin should be loaded on both the client and server side.

Example Schema for a Fabric mod

{
  "schemaVersion": 0,
  "project": {
    "type": "fabric",
    "name": "Demeter",
    "namespace": "demeter",
    "version": "1.0.0",
    "description": "A fabric mod api for adding custom properties to crops",
    "homepage": "",
    "sources": "",
    "wiki": "",
    "issues": "",
    "donate": "",
    "license": "MIT",
    "categories": [
      "fabric",
      "crops",
      "api"
    ],
    "authors": [
      "Wtoll"
    ],
    "environment": "common",
    "contact": [
      {
        "name": "Discord",
        "link": "https://discord.com/labdfkajbf"
      },
      {
        "name": "Email",
        "email": "demeter@gmail.com"
      }
    ],
    "entrypoints": [
      {
        "path": "com.wtoll.demeter.Demeter",
        "type": "common"
      },
      {
        "path": "com.wtoll.demeter.client.DemeterClient",
        "type": "client"
      },
      {
        "path": "com.wtoll.demeter.server.DemeterServer",
        "type": "server"
      }
    ],
    "mixins": {

    },
    "relationships": [
      {
        "namespace": "minecraft",
        "version": ">=1.16",
        "type": "hard"
      },
      {
        "namespace": "yarn",
        "version": "1.16+build.1",
        "type": "dev"
      },
      {
        "namespace": "fabric-loader",
        "version": ">=0.8.8+build.202",
        "type": "hard"
      },
      {
        "namespace": "fabric",
        "git": "https://github.com/FabricMC/fabric",
        "version": ">=0.13.1+build.370-1.16",
        "type": "hard"
      }
    ]
  }
}
ghost commented 4 years ago

The idea for the manifest spec above is to create a standardized manifest spec that mod loaders can replace as their manifest spec (or that we can convert from our spec to their spec on the fly) that contains all of the existing information necessary for the projects to work. So that's why it contains a lot of fields that we might not necessarily use in the index itself but are necessary to include in order to ensure compatibility with all of the types of projects.

One question I had while writing this was whether or not forge and forge-loader are separate dependencies because that might change things a little bit. Secondarily how does forge determine which class is the entry point into the mod. Like how does forge know that this specific method in this specific class in this specific place is what it's supposed to execute to load a mod? And are those things we should potentially include in the manifest.

ghost commented 4 years ago

Maybe we turn all of the "homepage", "sources", "wiki", "issues", and "donate" fields into their own sub object like "links" or "pages" or something because that's fundamentally what they are.

We also might want an "images" link for an image gallery like curse forge.

lazoolimc commented 4 years ago

Absolutely incredible, @Wtoll! The level of detail here is tremendous.

Do you think we should include worlds as one of the resources we support?

Honestly, I wouldn't argue for that. I'd love to keep things as simple as possible, and I really haven't seen anyone express a need for worlds in any of the conversation I've seen.

Regarding versioning and semver, there was an active discussion in the Discord this morning. The concensus seemed to be that we should encourage but not enforce semver. Very few Forge mods conform to semver (or any coherent versioning), and API compatibility is much less of a concern when it comes to mods. I, personally, would encourage any and all devs to use the form of semver (i.e. release identifiers should be x.y.z triplets) without the requirements about API compatibility. That said, enforcement would add friction to adoption (in a worst-case scenario we'd end up with a Final-Fantasy-in-the-USA situation where a mod would be "version 4, release 17" according to them, and "4.3.9" according to the index).

So if we don't require a coherent versioning schema, how does an author indicate which versions of dependencies are required? I advocate for simplicity over perfection here: just use a list of concrete version numbers (e.g. ["1.0.1", "1.0.2", "1.0.3"]). This is definitely clunky, but would accommodate all use cases, I believe.

On lazoolimc's implementation of the authors field, like I said above I think we shouldn't distinguish between authors and contributors and instead have one link per project to something like paypal or patreon.

I'm sold. I'd love to get some input from actual devs on this one, but that seems very reasonable to me.

On lazoolimc's implementation of the files field, I don't think this is necessary because the releases page of GitHub should be sufficient to get all of the information we need and having project authors define each and every release specifically like this is not efficient whatsoever.

Can you expand on this? Is the suggestion that we only index the latest release of a mod? And would GitHub be a firm requirement?

lazoolimc commented 4 years ago

Another interesting thought that emerged from the Discord this morning was from a member called Luna (discussion starts here).

Their proposal was that the actual data about each release of a project (mod, datapack, etc) be hosted by the mod author alongside the distributable binary, and that the index is simply a list of URLs (plus or minus checksums) to those artifact/metadata pairs. I suggest any interested parties read it, it's very interesting!

There were some performance and security concerns, but it would certainly make the act of curating an index much simpler.

stairman06 commented 4 years ago

I don't think GitHub Releases should be required for mods - for developers who don't use Releases it may be confusing, and the spec shouldn't be tied to a specific platform. lazoolimc's files implementation is easy enough to read and edit and allows authors who don't use GitHub to submit mods.

With regards to metadata and info about a project be stored by the developer, it's an interesting idea with an unfortunate downside of performance. If you wanted to, for example, search for JEI in the index, you would have to go through every URL and repository where manifest information is stored. Sending potentially thousands of requests is not optimal and is frankly unnecessary.

ghost commented 4 years ago

So the implementation has two specific dependency “types” one for maven and one for git. Regardless of what platform either of those are hosted on the management for where the actually binaries is done by the platform itself and not by us. So like every git repo marks release points in time by repo tags and we can build those sources, and maven uses its own manifest system for organizing dependencies. I don’t particularly love the having a long list like that inside of the manifest itself or making it required. I propose that for mod developers that want to use it we have specific handling of git repositories and mavens as the spec I proposed already implies, but then have an optional field for a link to a separate json file that acts as an override for manually defining the location of binaries. This would be the best of all worlds as the manifest would stay short, it would be easy for mod developers using the right platforms to manage dependencies as it would be done mostly automagically, and it also doesn’t box developers in to using one platform as they always have the option of a manual override. We would just need to add another field type to the relationship object for a url that uses the custom external json file format.

I like the idea that the index is basically just a list of URLs, I feel like the spec I proposed would allow for that because it offloads logic to git repos and mavens to supply the versions or the mod developers themselves if they choose to use the aforementioned override.

ghost commented 4 years ago

As for the whole search thing the spec I proposed explicitly outlines that all projects must provide a namespace field. I figure the index can basically keep a list of key value pairs, the key being the namespace and the value being a url to the dependency and search can be done by namespace.

ghost commented 4 years ago

Can you expand on this? Is the suggestion that we only index the latest release of a mod? And would GitHub be a firm requirement?

The idea would be that each time a release is made in a git repo it marks a tag in history for the sources of that version so we don’t actually need to list ever single version in a file because git itself manages what the release looked like for every single release and it’s really easy to just query a tag for the version value. This is across the board for git itself not just GitHub so it should have pretty widespread support.

The index could just keep a link to the git repo and then query tags any time it wants to find releases instead of relying on explicitly defining every release in a file.

lazoolimc commented 4 years ago

Ah, I think I see now. In my haste, I guess I missed this important note:

All resources that are to be indexed by the MC Index must contain a mcindex.json file, formatted using json, that follows the given specification.

If I'm understanding correctly, this is a separate document from an actual index. mcindex.json would be comparable to the Cargo.toml file used by the Rust/Cargo system, not necessarily the structure of the crates.io API. Perhaps, when a tool is trying to assemble a set of mods/assets, this file could be read to gather further information about the asset as well. For that purpose, this definitely makes more sense. Having this sort of information normalized would definitely be a good thing.

Does this approximately look like what we're talking about? graph

I see a few shortcomings from a system based on these documents. Assuming that an index would be a document listing these documents (correct me if I'm wrong):

  1. Would this make git (or any VCS) a firm requirement for consumers of this format? Most user-facing software currently does not have this requirement, and I think the average user wouldn't really want to install git or mercurial or whatever to play a modpack.
  2. This doesn't address security or safety directly. If a benign project were to be added to an index, it would subsequently be able to release malicious builds without involvement of that index. It could be removed, but only if someone notices.
  3. Performance (as touched on by stairman06) would be a concern. Since a consumer of this format would have to fetch these documents for each asset of interest, it would have to fetch the entire dependency tree to determine what, if any, transitive dependencies are necessary.
  4. I still don't quite grok how versioning should be handled here. Would implementors have to write handlers for individual platforms (e.g. Github, Gitlab), in the case that the developer doesn't list releases explicitly?

I love a lot about the proposal! If we can work out some of these systemic issues, I think things would be in an excellent place.

carlos-miller-466 commented 4 years ago

Would this make git (or any VCS) a firm requirement for consumers of this format? Most user-facing software currently does not have this requirement, and I think the average user wouldn't really want to install git or mercurial or whatever to play a modpack.

While I agree with that reasoning, the ability to directly channel Github releases to User may be the version handling solution this needs. For those that do agree to download Git (or otherwise), then I think versioning is handled and so is the ability to handle security. Though for my own clarification, does this mean the index would handle providing Projects and handling dependencies while Git (or other) will ensure that the version that User is provided matches that of the index and it's required dependents?

To add to security handling, if one is to use any VCS; the access to it must be public or at least the version supplied. Though with a significant amount of mods added each week, it would be hard to go over them all manually. Github is coming along in development of security analyzing, but I can't say I would trust that entirely. "Someone noticing" is not a way we should approach security, my only recommendation is a volunteer mod security oversight group. Let's say an author publishes for the first time without any prior releases, their mod is to be over-viewed; if such mod comes back as containing malicious content then they are banned from the Index. Otherwise they will not have any manual review unless Github Security pings it or a security member randomly checks it. Manual checking of malicious content just seems a bit grueling.

stairman06 commented 4 years ago

GitHub releases work well - but they're GitHub only and I think tying an open index to a specific platform is a bad idea. It also adds confusion and extra work for developers who either don't know how to use Git or don't wish to.

In terms of security, the biggest issue is letting authors host their own manifest file. Authors would be able to change the binary of a previously-approved asset into a malicious one, without going through the Index. While it's possible to store a hash of the manifest when it is approved, that would just be extra work as mod authors would have to update information in two places.

I believe the current plan for mod review is a volunteer moderation team, which would be a lot of manual effort but it's the best solution for now.

stairman06 commented 4 years ago

In any case, I'm glad we're having a discussion and working things out - once we get the issues worked out I think we'll have a good standard.

carlos-miller-466 commented 4 years ago

What deficits would come in limited manifest file hosting, is there any alternatives that could be used? Being a choice-friendly standard is, of course, a great course of action, but if no other meaningful or plausible solutions exist then there may be proprietary necessities in the best interest of both users and developers.

stairman06 commented 4 years ago

You brought up a good point earlier that I didn't address:

...the index would handle providing Projects and handling dependencies while Git (or other) will ensure that the version that User is provided matches that of the index and it's required dependents

Certain metadata about each version such as the changelog, dependencies, supported Forge/Fabric versions, will need to be stored alongside it's respective binaries.

If you stored this metadata as a JSON file included in each GitHub release, you run into another security problem. Hashes for each version metadata would need to be stored to prevent unapproved changes.

If you stored the version metadata inside the asset's manifest (stored in either the index or hosted by the author), less network requests would be made and less hashes checked.

When everything is stored inside the manifest, then GitHub releases are only used for hosting of the binaries. It would be easier to simply store the download URL of the binary in the manifest, rather then have launchers create custom support for GitHub releases. Authors could still use Releases and simply link to each file.

stairman06 commented 4 years ago

With regards to the hosting of manifest files, they could be hosted by authors and I think it would be fine as long as checksums are used - I just think it would be better for performance and easier for authors if it was stored in the index.

carlos-miller-466 commented 4 years ago

Fully agreed, I think that solves the security front to an extent I feel comfortable with. I also think that's the answer to the other standing issues as long as a VCS <> Index solution is set in stone.

ShadowOTE commented 4 years ago

Been reading thru the WIP spec and comments to date and had a few questions, ideas, and suggestions. Note that I have no first hand mod dev experience, and am drawing on my dev experience using package manager tools such as nuget. I definitely wont be offended if some of the below turns out not to apply or needs adapted to be suitable for this spec!

Existing Attributes

project.description Should this support any form of hypertext markup (ex, for embedded links/images)? Alternatively, should there be a plaintext description and an HTML description?

project.contact Rename to project.contacts for clarity.

project.sources Reading this I'm not 100% sure what the intended content is. The spec currently defines this as "containing the sources for the project". My initial assumption was that this refers to the project source code repository, in which case I would recommend renaming to project.source, project.sourcecode, or possibly project.coderepository. However, I suspect this may be intended to link to the output file(s) - if so, I would recommend renaming to something like project.distributionSources (and adding a separate attribute linking the source code repo).

project.name Needs an example demonstrating how project.namespace should translate to project.name when a name is not explicitly specified.

project.license Should this be split into project.licenceName (using existing description) and project.licenseUri (direct link to the license)?

project.relationships

Proposed Additional Optional Attributes

project.distributionSources A list of URIs + the associated packager manager type (eg, NPM, Nuget, etc) where a zip containing the resource/compiled binaries/jars/etc can be downloaded from, to facilitate package manager tooling for mod/modpack developers.

project.previousVersion Specifies the previous version. Expectation is that this can be found under the same project.namespace.

project.predecessor Allows identification of a predecessor mod version (eg, "ModCraft 2" might point to "ModCraft 1"). Should point to a different project.namespace.

project.successor Allows identification of a successor mod version (eg, if "ModCraft 1" releases an update after release of "ModCraft 2", this could provide a reference to the successor mod. Should point to a different project.namespace.

project.compatibleMinecraftVersions Defines the list of minecraft versions this package is compatible with. This could be the minimum verison number and an optional maximum version number (to allow identification of breaking changes should they occur). Alternatively, this could be defined as the list of major minecraft versions to avoid needing to update as minor releases occur.

project.resources This would be defined as a list of links and possibly descriptive text. Alternatively, could title as project.resources or project.socialMedia, or possibly consolidate under project.contact

project.checksum An MD5 checksum to use when validating the resource has not been altered/corrupted.

project.publicKey The public key corresponding to the private key used by the author when digitally signing the files.

project.digitalSignature When specified, project.checksum and project.publicKey become required. Content when decrypted with project.publicKey should match project.checksum.

Reference Data

I think we should consider adding a section defining reference data. Suggested starting items:

stairman06 commented 4 years ago

Glad that we're having a discussion and we can all share our opinions.

Projects vs Versions

As we're defining this specification I think an important distinction between Project and Version needs to be made. Should there be certain attributes that apply to the project, and certain attributes that apply to each version of said project?

Support for multiple versions of a project is definitely important. In my opinion, we should go with the method originally suggested by lazooli. Project slugs, authors, descriptions, and other non-version-specific attributes, are stored at the project level. From there, a files attribute contains info required for versions, such as dependencies, download links, supported Minecraft versions, checksums, etc. It's unnecessary to store duplicates of attributes that may not change with new project versions (descriptions, authors)

{
  "schemaVersion": 0,
  "project": {
    "name": "Example Project",
    ...
    "versions": [
       {
         "name": "1.0.0",
         "dependencies": [
           {
             "src": "mccip-index" // the index is specified in case other platforms wish to implement this standard
             "namespace": "example-api",
             "version": "1.0.0"
           }
         ]
         "files": [
           {
             "url": "https://example.com/examplemod.jar" // any url will work,
             "checksum": "<checksum goes here>"
           }
         ]
       }
    ]
  }
}

This is similar to how CF currently works and it's easy for users, developers, and authors to understand.

Opinions on ShadowOTE's suggested attributes

project.description I'm not 100% sure on how this should be. Markdown (or an extension like Github Flavored Markdown) would work fine. It's when you run use raw HTML where you run into potential issues. As a standard, it's impossible to choose what can and can't be used in raw HTML. However as an Index I think we should only accept mods if they follow the following HTML rules:

project.relationships, project.compatibleMinecraftVersions, project.checksum Relating to my first topic, these should be moved to a version specific attribute (e.g. version.relationships, version.compatibleMinecraftVersions)

project.publicKey and project.digitalSignature These are interesting. I don't have enough knowledge into digitally signed files, but would these be per-project or per-version?

project.successor and project.predecessor Sounds like a good thing to add.

project.previousVersion If the version system I mentioned in my first topic is implemented, then this wouldn't be necessary.

project.distributionSources Not 100% sure what the purpose of this is. Could you expand or provide an example of this?

I'm loving the discussion and proposals, and I hope we get to an agreed standard.

Dthen commented 4 years ago

Maybe we turn all of the "homepage", "sources", "wiki", "issues", and "donate" fields into their own sub object like "links" or "pages" or something because that's fundamentally what they are.

We also might want an "images" link for an image gallery like curse forge.

While we're at it, it's 2020, a videos gallery would be amazing, rather than everyone just embedding them in descriptions. I don't see why it couldn't effectively be a list of YouTube URLs.

Minenash commented 4 years ago

No \<input>, \<button>, or \<form> tags

Why no buttons? They would be useful for linking to something like a wiki while looking good. If visuals aren't important, then something like markdown should be used instead of html.

stairman06 commented 4 years ago

I mentioned no buttons because they have little purpose without JavaScript. An <a> tag can easily be styled to look like a button without executing any JavaScript.

ShadowOTE commented 4 years ago

@stairman06 I agree the optimal solution is pulling metadata up to a higher level and having a list of project versions.

project.description I have no preference on type of markdown. HTML does introduce potential issues like xss attacks, so it may be appropriate to use a more limited markdown format. HTML was just the first flavor that came to mind.

project.publicKey and project.digitalSignature The project.digitalSignature should be version.digitalSignature if we have multiple versions per project. Typically I would expect project.publicKey to be per-project, but an argument could be made this should be at the version level (eg, in the event the private key becomes unavailable, thus preventing additional releases). Depending how much moderation is expected, one option might be to set it at both levels, and require moderation for approval to release a new version with a different key (ie, approval of a hand-off in ownership or generation of a new key). In that scenario, submitting an update with a new public key would trigger moderators to review rather than auto-accepting; approval would update the public key at the project level, effectively transferring ownership to the holder of the new key. This could be controversial, for obvious reasons, and so it may be better to simply set at the version level.

project.relationships, project.compatibleMinecraftVersions, project.checksum I agree these should probably be set as part of a version specific attribute. Regarding project.relationships, are we defining relationships or dependencies here? If the latter, it may be more clear if we set the name to project.dependencies instead.

project.previousVerison Agreed - if we move this direction, no need for this field. However, we may want to add a verison.releaseDate or version.releasedOn property to formalize version history for cases where semver 2.0 is not followed.

project.distributionSources This would be a link to one or more locations a packaged version of the project can be found, and would be version specific. Ex:

https://www.nuget.org/api/v2/package/Microsoft.EntityFrameworkCore/5.0.0-preview.6.20312.4 (https://www.nuget.org/packages/Microsoft.EntityFrameworkCore/5.0.0-preview.6.20312.4 if you'd prefer the web page associated with the package rather than downloading a random file!)

Automated tools (along the lines of nuget/npm/etc) could then be built off of this data to automate pulling down the package and unpacking in the appropriate directory.

stairman06 commented 4 years ago

project.publicKey and project.digitalSignature Version specific for the public key and signature seems like the way to go, in case authors lose the private key or an event where it is unable to be used.

project.relationships I agree it would be clearer if it were named version.dependencies

version.releaseDate The release date would be helpful for both launchers checking for the latest version in cases where semver is not used, and for users who simply wish to see when a specific version was released.

version.distributionSources Would this simply be a list of download URLs for a specific version and be a replacement for something like downloadURL? If not, then what would be the difference between distributionSources and downloadURL?

ShadowOTE commented 4 years ago

I can't think of any differences in intent, and like download_URL better as a name. We may want to tweak the name to downloadURLs and allow multiple sources to be defined (though the file should be identical on all specified hosts). Is there any metadata that should be attached? The only item I can think of is priority (to route traffic away from smaller hosting providers if needed). Pretty sure there isn't a need for 64 vs 32 bit specifiers for jar files, but I spend my days playing with C#, so my Java knowledge is mostly theoretical.

stairman06 commented 4 years ago

downloadURLs sounds good to me. As far as I know, there shouldn't need to be any metadata that needs to be attached, but priority is interesting:

{
  "id": "example-project",
  ...
  "versions": [
    {
      "name": "1.0.0",
      "releaseDate": "2020-06-26T00:00:00",
      "framework": "forge",
      "files": [
        {
          "name": "ExampleProject1.jar",
          "required": true,
          "downloadURLs": [
            {
              "url": "https://example.com/ExampleProject1.jar",
              "priority": "high",
            },
            {
              "url": "https://smallhostingprovider.com/ExampleProject1.jar",
              "priority": "low",
            }
          ]
         }
       ]
    }
}

The downloadURLs could also be done as a list going from highest to lowest priority

{
  "downloadURLs": [
    "https://example.com/ExampleProject1.jar",
    "https://smallhostingprovider.com/ExampleProject1.jar"
  ]
}

With my first proposal with defined priorities, it's possible for launchers to randomly select a source with high priority. With my second proposal with an array, launchers will go in order, so the first source will always be used first. I'm not sure which system is the better option, those were just two ideas I had.

ShadowOTE commented 4 years ago

I like the high medium low approach. But ultimately it's up to the spec consumer to decide on a traffic routing scheme, so it's probably not worth spending too much time thinking about at this point.

immibis commented 4 years ago

My thoughts:

More speculatively, I wonder if there would be value in separating the download locations from the actual mod index. They're updated more frequently (they aren't append-only like the version metadata), different people might want different ones (imagine a site rehosting all the mods on its CDN), and there are also lots of use cases that don't need it.

stairman06 commented 4 years ago

You're right about numbered priorites, there's no need to limit it. Combining Forge and Fabric into a single version is an interesting idea. For me as a launcher dev, it would be much easier to simply check if the latest version has supported for a particular modloader, rather than checking every version. I think it could be implemented clearly and easily:

{
  "id": "example-project",
  ...
  "versions": [
    {
      "name": "1.0.0",
      "releaseDate": "2020-06-26T00:00:00", 
      "files": [ // if the name "files" is unclear, it could also be named "artifacts"
        {
         "name": "ExampleProject1Fabric.jar",
         "frameworks": ["fabric"], // multiple frameworks can also be speciifed if both are bundled in one file
         "download": [
           {
             "url": "https://example.com/ExampleProject1Fabric.jar",
             "priority": 1
           }
         ]
        }
      ]
     }
  ]
}

Automated installation is important so installation methods should be clearly defined. They should be per-file under each project version, and there could be a predefined list of installation methods, such as modsDirectory, jarMod, and other.

"versions": [
  {
    ...
    "files": [
      {
        ...
        "installationMethod": "modsDirectory"
      }
    ]
  }
]

I also agree that there's no point in mixins, entrypoints, or other loader-specific attributes being defined in the manifest.

Your idea of separating download locations from the index is interesting. From what I understand (correct me if I'm wrong), launchers would go to a CDN and request to download a specific file, instead of going to a predefined URL in the manifest. This leads to a few questions, such as who will be operating the CDNs and how they'll be chosen. Of course, I may be misunderstanding your proposal, so correct me if I was wrong.

immibis commented 4 years ago

Regarding separate download locations: The main index only includes the file hash (perhaps several different hashes and the file size). A separate download index would contain the URLs for each file. The main index data will never go out of date but the download index could. And launchers could choose to download from their own source, for example https://my-minecraft-launcher.com.example/download/fb3180333a6e6470031e23f3de0221ce or https://my-minecraft-launcher.com.example/download/mod-identifier/version-identifier/file-identifier.jar . They could initialize their CDN from the download index, but after that, they have no use for it.

stairman06 commented 4 years ago

Thanks for the explanation. I really like this idea and I think it should be highly encouraged. But you mentioned that "the download index could [go out of date]" While using external CDNs solves the problem of an outdated index (as they have their own local copy), an outdated index becomes a problem when there are no CDNs.

For example, I develop a launcher with an extremely small userbase. While I would love to be serve files from my own CDN, it's not a feasible decision for me to spend the time, effort, and money working on a CDN that very few people are actually going to use.

CDNs utilizing the Download Index should be encouraged and I hope they get used - but the Download Index should be kept up to date and maintained for the people who are unable to use CDNs.

ShadowOTE commented 4 years ago

We've had a lot of good discussion since the initial prototype spec. Perhaps now might be a good moment to incorporate whats been discussed into a v0.2 spec so we can see how it all fits together?

stairman06 commented 4 years ago

I've gone back through the discussion and created a Gist which contains the attributes and suggestions brought up. For extra context, the example mod presented in the Gist is a mod which distributes Forge and Fabric supported-versions inside of one JAR file.

The versions object is certainly large, so perhaps separating it into versionManifest.json, or separating each version into its own file would help.

ghost commented 4 years ago

Have we found a good system for automatically generating or managing creating and releasing new versions because honestly, as a developer myself, having to manually add new versions and enumerate them each in a specific file is kind of a deal breaker.

I think this system looks really great from an index and pack development perspective because it has all of the information in one place, but from a developer perspective it’s kind of a mess and would be really hard to maintain.

Also, if we’re defining forge and fabric in the dependencies section then why do we need to specify which loader it runs on in a separate section. I feel like It can easily be inferred.

ShadowOTE commented 4 years ago

@Wtoll I think that's an excellent question. Looking at the gist @stairman06 kindly put together, I see 3 categories of information:

  1. Project Metadata - This is data that is typically entered once, and only occasionally updated. Tooling could easily be created for client tools or incorporated as part of upload processes to register/update these fields
  2. Version Metadata - This data can be generated by tooling on the client side, or during upload by scanning files and using checksums to find matches in the index.
  3. Download paths/CDN info - This primary download link(s) can probably be populated automatically during upload. If secondary lists are available separately, those could be periodically updated by content scraping bots.

Overall, I think the real trick here is going to be setting this up so it works as a streamlined part of the author's workflow. That will likely mean developing a suite of plugins, templates, bots, and other tooling. For example (and keep in mind, I'm not a minecraft mod dev, so hopefully I'm not too far off the mark here!):

  1. Dev installs a plugin for their IDE of choice
  2. Dev creates a new project using the plugin/template. This prompts them to fill in the project level info, and provides tooling to update it later.
  3. Dev adds a reference to another mod using package manager tooling installed as part of step 1. The tooling downloads the mod/resource and unpacks it in the appropriate folder, then registers it in their IDE.
  4. At some point, the dev chooses to publish. Ideally this would be part of the tooling added to their IDE as part of step 1. During upload, the references are validated and the dev is warned regarding any references that have missing/invalid metadata. Ideally the upload registrar would assist in correcting. The dev could also be given the chance to update metadata, and may or may not need to provide a primary link for download sources + additional sources/CDNs.
  5. After publishing, the dev is then provided an updated version of the file to import back into their project; if the IDE tooling is involved, it should import and overlay automatically on successful publish.

This is obviously very rough, and I suspect may be optimistic depending on what tools mod/resource devs are using, but the core concept is that there should be tooling that abstracts the spec into a "behind the scenes" file that is automatically built and maintained as much as possible. Simultaneously, it needs to make their lives easier in order to drive adoption. Also keep in mind that we need to be able to create tooling to scan and pregenerate these files so existing mods have a starting point, or else the chances of this standard being adopted by dev teams is very low! Ideally they'll be able to drop this into their projects and hit the ground running, instead of painstakingly building it from the ground up (particularly important for mod pack devs!).

I'll defer to others in this discussion with the relevant domain expertise regarding how all of this will work with loader specific dependencies and such. However, I suspect the tooling could be made smart enough to allow the dev to configure this as part of adding references.

stairman06 commented 4 years ago

@ShadowOTE IDE-based tooling is an interesting idea. My original idea was to create a web-based tool that allows authors to manipulate manifest and version files visually. However taking advantage of IDEs that developers already use seems like a more-integrated and seamless solution.

One of the issues that appears has to do with publishing. When an author decides they wish to publish their mod, where does this request get stored? GitHub Pull Requests seem like a simple solution, they're built in to GitHub with helpful tools like the ability to merge straight from the website and diff viewing. However if authors are required to fork the repository, commit changes, and create a PR, it could be too large of a hindrance.

With that being said, perhaps the web-based tool or IDE integration could come in handy here? It might be possible, using the GitHub API, to automatically fork the repository, commit changes, and create a PR all with the author only having to click Publish.

I'm not sure what the right answer is. The optimal solution will be extremely seamless for developers and require little effort, and the moderation team will be able to easily review and merge submissions.

Minenash commented 4 years ago

Since this issue is for the spec, unless the tooling would affect the spec itself, it should probably be in a separate issue

ShadowOTE commented 4 years ago

I like the idea of having publishing hooks integrated in with git, so that PRs could be configured to trigger a publication workflow. Not sure how easy/feasible that would be to accomplish, but that could potentially address one of the thornier areas (and also avoids having to replicate tooling across multiple IDEs!).

@Minenash agreed - this is dragging us off topic. I'll create a new issue.

ghost commented 4 years ago

@Minenash I recognize the importance of keeping things on topic but I think it’s valuable for us to at least consider the feasibility and the effort that will need to go into developing these tools before we build a theoretical spec around them.

The fact is that without mod developers who are willing to use the system there is no point in making an index to begin with, and so either tooling needs to be a major priority for us (which it doesn’t currently seem to be as it doesn’t exactly fit into the whole model of only building an indexing system and allowing everyone else to figure out how to use the index) or we need to build a better system in the index spec to begin with.

ghost commented 4 years ago

All I’m saying is that I briefly ran the current direction of the project by my team and we all kind of agreed that unless the project is also guaranteed to be developing a massive suite of developers tools (something that takes a lot of time and effort) it wouldn’t make sense for us to switch over, we’d probably end up just hosting our mods on our own website because it would be easier.

Minenash commented 4 years ago

I don't disagree, however the conversation was getting more into specifics of the tooling. Whether it's a web app or a plugin doesn't really affect the spec. Ofc without them the spec would be useless, but I think the tooling conversation should be separate. That's not to say it can't influence it. For example I'm in favor of trying to keep the spec simple to help with tooling (or to make it even feasible to do it without). If a certain thing from the tooling issue affects the spec, then it can be mentioned here.

ShadowOTE commented 4 years ago

I think its reasonable to ask questions as to technical feasibility of proposed additions/modifications to the spec; moving beyond the question of "can this be implemented" into "how do we go about implementing" belongs in a separate discussion thread (issue #2 for now).

@lazoolimc @stairman06 - so far we've been prototyping in gists, and we have a prototype spec (outdated) and a prototype format example. I'm not seeing a way for folks other than the original author to update the gist (though I only took a brief glance, so I may have missed it). Think we're at the point we can start formalizing into .md files stored in the repo?

comp500 commented 4 years ago

As immibis mentioned in #2, would the index act on top of existing and WIP mod websites, like Curseforge, Diluv, Astronave, Fabricate, etc - and if so, can we simply defer moderation and filtering to those running these sites? Obviously though, this doesn't promote decentralisation if everyone just uses those sites through the index, and we'd have to support some form of independent hosting. We need to ensure that however we allow mods to be added, we either moderate mods individually or trust someone else to do so.

Using project.releaseDate as a fallback for version comparison sounds like a very good idea - I wonder if we could figure out how to retrofit it onto the semver version (like 1.0.0-2020.08.01.23.01 maybe?) although it may cause issues if mods don't follow semver rules properly. We could just not trust mod versions at all, and do everything based on release date - mods are unlikely to specify they don't work with a newer version, only an older version. You may also want to look into the NPM comparison format for semver of which a limited subset is implemented in Fabric, similar to what Wtoll suggested, as an array of version numbers won't work when the dependency updates. Fabric loader uses a SAT solver to determine a valid version selection; we probably won't need anything this complicated but some way of locating the latest valid version for a given game version and mod loader would be useful.

When tools consume the index data, would it be a good idea to save bandwidth by separating file metadata into a separate file? You could also implement automated methods of updating the main project metadata to point to the latest file metadata for each mod loader/game version/alpha/beta combination - Curseforge does this in their API (although they don't have 1.16 or snapshot versions in this list which is really annoying).

Similarly, another endpoint for retrieving mod files that would be useful is identification by hash. This would be incredibly useful for modpacks importing lists of files, and Curseforge already supports this with the Murmur2 hash, and uses it in the Twitch launcher. Can mod slugs be renamed? It may be useful to have a project ID to uniquely identify a mod regardless of renames. We may be better served by an actual database rather than using Git to store this metadata, for both scalability and speed, however this also poses concerns of "who will host it".

Another use for multiple files for the same version that would be useful is a version of a mod that has no bundled dependencies for making modpacks - as Fabric mods can use jar-in-jar embedding to bundle their dependencies, and fabric-loader resolves these versions at runtime. We could make something that searches the index for these dependencies, and if they exist adds them to the metadata and strips them from the mod file. It would still be useful to keep the original file available though for users, so they don't need to download these dependencies.

As immibis and others have discussed before, I agree that yarn mappings (loom remaps to your dev environment names anyway) and entrypoints should not be part of the index - these are stored in fabric.mod.json already. As I have stated in #2, I think a gradle plugin would be able to submit all the necessary metadata, generated from the existing fabric.mod.json data and gradle dependencies, and anything else that is needed can be specified (build.gradle is already a rough equivalent to Cargo.toml in the Java world) in the parameters for the plugin. Making mod loaders use information from the index would be hard and not really that useful - once the JAR is in the mods folder, the metadata the mod loader needs is already there. FYI, Forge figures out where the entry points are by scanning the mod's classes for annotations.

Should [dependencies] allow specifying a checksum to facilitate validation that the related item matches the resource referenced by the author at the time of publication?

I approve of this idea - if a dependency does specify an exact mod file, it should contain a way of verifying it... although this may not be that much of an improvement if all the mod files are in the index anyway. Maybe the tooling could generate lockfiles when submitting, and redistribute them somewhere? It still depends on how distributed we want to go, and if we want to store some metadata in user-hosted mod repositories rather than the index. I recommend you read the excellent article So you want to write a package manager, although not all of it applies to this project.

stairman06 commented 4 years ago

Acting on top of existing mod websites is an interesting idea... I'm not sure on the right implementation or if it should be done at all, maybe only as a "bridge" to allow authors to get used to the index.

Saving bandwidth by separating file metadata is a good idea and can make handling of large mods with many versions easier. This leads into your next point about retrieving mods via file hash, which could be useful. This also relates to your point about using an actual database.

Git(Hub) in many ways is a good system for a project like this. Commits and commit history make changes to the index transparent. And Pull Requests provide an easy way to moderate and approve changes. I've been working on a tool that allows you to visually edit a project and submit it for changes via a Pull Request, but that's more related to #2.

Storing everything in Git has its drawbacks though. For example, if a launcher wished to provide integration with the Index and allow users to search for mods from within the launcher, how would this be implemented with Git? Well, without an official server of some kind, the author of the launcher could either: (1) Write their own server that makes searching and downloading from the index easy, but costs money to operate. (2) Download a copy of the index locally on the launcher, and redownload it every so often to stay up to date.

Option 1 is the more efficient solution, but for FOSS projects any financial decision has to be well justified. Option 2 works, but is inefficient.

Unless there's a company willing to give away servers for free (which I highly doubt) or there are users willing to financially support this project (again, doubtful), I don't think there's any easy solution.

ghost commented 4 years ago

Just saw this was a mod that was created. It uses what I feel would be a good strategy for managing versions. If you look at the various update strategy options it handles versioning in a multitude of ways some of which make it easy on the developer, and some of which are more open such as the json option. I see no reason why we can't just pull from GitHub releases for some mods (as an example), and pull from json for others, but I think it would be good to have that flexibility. Developers who want an easy route can just specify GitHub in the manifest (or just git in general because most git hosting websites work the same) and have the index read from GH releases, developers that want more control can specify JSON in the manifest and write their own index file in the root of their repository.

https://gitea.thebrokenrail.com/TheBrokenRail/ModUpdater/src/branch/master/MOD_DEVELOPER.md

stairman06 commented 4 years ago

Right now I think maintaining an index of mods and figuring out how to serve them shouldn't be high priority - I think we should be more focused on developing a standard that can be used by some of the new hosting services (like Diluv and Modrinth). That isn't to say using Releases is a bad idea, I just think the format should be of higher priority right now. Releases integration could be handled on the end of the hosting services, so it wouldn't be tied directly to the JSON format. This is all my opinion, of course so feel free to debate if you think otherwise.

immibis commented 4 years ago

If we split up MCIP into sub-projects - index specification -> data collection -> distribution - then we don't really need to consider distribution efficiency until the distribution stage. The data collection stage can happily create a single file, even if it's something silly like multiple gigabytes of JSON.