[BUG] Publishing failing because it's trying to read package data from the default branch

I know this issue is older, but it is something I've been looking into.

After playing with a few different ideas on how we could do this, I think below is probably the best way:

We use the content rest API endpoint with the ref tag associated to the tags endpoint return.

Meaning for the example of star-ring we would need to get the tag data first:

https://api.github.com/repos/mauricioszabo/star-ring/tags

[
    {
        "name": "generic-lsp@2024.02.10-04",
        "zipball_url": "https://api.github.com/repos/mauricioszabo/star-ring/zipball/refs/tags/generic-lsp@2024.02.10-04",
        "tarball_url": "https://api.github.com/repos/mauricioszabo/star-ring/tarball/refs/tags/generic-lsp@2024.02.10-04",
        "commit": {
            "sha": "ef9aed5d82df0825453e9a3754d9b85023be6bdb",
            "url": "https://api.github.com/repos/mauricioszabo/star-ring/commits/ef9aed5d82df0825453e9a3754d9b85023be6bdb"
        },
        "node_id": "REF_kwDOKQyg_toAI3JlZnMvdGFncy9nZW5lcmljLWxzcEAyMDI0LjAyLjEwLTA0"
    },
    {
        "name": "generic-lsp@2023.09.08-00",
        "zipball_url": "https://api.github.com/repos/mauricioszabo/star-ring/zipball/refs/tags/generic-lsp@2023.09.08-00",
        "tarball_url": "https://api.github.com/repos/mauricioszabo/star-ring/tarball/refs/tags/generic-lsp@2023.09.08-00",
        "commit": {
            "sha": "183dbc7f52dae7357d25da8019ab890cea2130ee",
            "url": "https://api.github.com/repos/mauricioszabo/star-ring/commits/183dbc7f52dae7357d25da8019ab890cea2130ee"
        },
        "node_id": "REF_kwDOKQyg_toAI3JlZnMvdGFncy9nZW5lcmljLWxzcEAyMDIzLjA5LjA4LTAw"
    },
    {
        "name": "generic-lsp@2023.06.09-16",
        "zipball_url": "https://api.github.com/repos/mauricioszabo/star-ring/zipball/refs/tags/generic-lsp@2023.06.09-16",
        "tarball_url": "https://api.github.com/repos/mauricioszabo/star-ring/tarball/refs/tags/generic-lsp@2023.06.09-16",
        "commit": {
            "sha": "093420dc23e3d2b4e6d8d9e157acd983330072cf",
            "url": "https://api.github.com/repos/mauricioszabo/star-ring/commits/093420dc23e3d2b4e6d8d9e157acd983330072cf"
        },
        "node_id": "REF_kwDOKQyg_toAI3JlZnMvdGFncy9nZW5lcmljLWxzcEAyMDIzLjA2LjA5LTE2"
    }
]

From here we would then utilize the name value within each tag object to collect the contents:

https://api.github.com/repos/mauricioszabo/star-ring/contents/package.json?ref=generic-lsp@2024.02.10-04

This gets us the same return as previous usage of the contents endpoints, but has pretty large implications to the existing system here.

Currently when the user attempts to publish a package we get all the data in discreet steps:

Confirm again it exists
Get package.json (From Default Branch)
Get tags
Get readme (From Default Branch)
Pass all data off to constructNewPackagePublishData() to create an object suitable to be given to the publish db call

But if we wanted to make this work instead, to make sure we collect all data from the tag instead of the branch, it'd need to look like:

Confirm again it exists
Get tags
Iterate through tags
For each tag:
- Get package.json
- Get readme
Pass all collective data to a new builder, to be given to the publish db call.

This would actually solve the issue of us having to "fake" version data for previous versions during first time publication. Since on first publication we publish all previous versions of the package (as determined by the tags). So that would be a net benefit.

Although it does introduce a new issue. We would need to store even more tag data on each version to allow the feature detection checks to work successfully after the fact. Since feature detection also works by reading the contents of the repository. But these checks happen after publication and return to the user, so that they don't have to wait on extra steps like this. So we would need to store the name of each tag so that we can collect it's information later during feature detection. Luckily there was some foresight when creating the DB schema that has a freeform meta column on each version entry that accepts JSON. So that could be added there to avoid the longer process of having to update the live database schema.

Anyone reading this may think the obvious simpler alternative would be to download the tarball of the data and read it locally, and I still am partially considering that. Except that we would need to add the dependencies to be able to read or extract the tar data, and then have to be very careful we only ever have any read and write behaviors within the /tmp storage, since GCP App Engine containers (that we use) are read only.

Otherwise, this is mostly a note to myself about what should be done, so it's not forgotten after some research. But otherwise if anyone has ideas feel free to contribute, but no need to do so.

pulsar-edit / package-backend

[BUG] Publishing failing because it's trying to read package data from the default branch #205