librariesio / libraries.io

:books: The Open Source Discovery Service
https://libraries.io
GNU Affero General Public License v3.0
1.1k stars 206 forks source link

Stop Removing Go Pseudo Versions #3330

Closed mikeyoung85 closed 3 months ago

mikeyoung85 commented 3 months ago

This changes the version remove logic for Go to not mark not found pseudo versions as removed. My rationale is that the upstream Go data sources are not very consistent with storing pseudo version information, but if we have seen a pseudo version previously then it should be considered valid since it is a combination of the module version, git commit date, and git commit hash that existed at some point. More info at https://go.dev/ref/mod#pseudo-versions. I think in order for a version like this to be removed and unreachable then the git history of the project would need to have been rewritten, but feel free to correct me on these assumptions. There will need to be follow up PRs/work to restore pseudo versions which have been marked as removed.

For testing I have added some data to the specs to check that the status is not touched for pseudo versions and have tested with a few packages locally by running a PackageManager::Go.update() on them with extra pseudo versions in my database.

wenottingham commented 3 months ago

Will this still handle the entire module itself being removed?

mikeyoung85 commented 3 months ago

Will this still handle the entire module itself being removed?

I think that happens in a separate process with https://github.com/librariesio/libraries.io/blob/bc1d969c6415e65e90919585565b2758afdd3e32/app/workers/check_status_worker.rb

mikeyoung85 commented 3 months ago

this makes sense to me since I also have a fuzzy memory of pseudoversions being inconsistent on pkg.go.dev.

One question: we've had the logic in place for a while to sync a "/v2+" module's base module with the same versions, but then logic with the MissingVersionRemover was added that marked those base module's fake versions as "Deleted" (since they can't actually be found for the base module).

Is this a good time to stop syncing those fake versions on the base module? I'm wondering if after this PR we'd end up with base module versions for "/v2+" packages that are partly "Removed" (real versions) and partly not-"Removed" (for pseudoversions).

I thought that change had already been made :) I don't think this PR would negatively affect things too much. It would be possible to have incorrect pseudo versions on the base module for any prereleases for module versions v2 and up, but we could go back and clean those up with the removal of the base module sync we are doing now. I'm not sure how common a pseudo version for a versioned module is, but it seems like something that would be possible.