librariesio / libraries.io

:books: The Open Source Discovery Service
https://libraries.io
GNU Affero General Public License v3.0
1.1k stars 206 forks source link

Ensure that we don't ingest a Go Package that is not a Go Module. #3298

Closed tiegz closed 5 months ago

tiegz commented 5 months ago

some Go Packages are slipping into Libraries that are not actual Go Modules (reminder: a "Package" in Go lives within a "Module". "Modules" must have go.mod files with their name at the top). e.g. this is a "Package": https://libraries.io/go/k8s.io%2Fkubernetes%2Fthird_party%2Fforked%2Freflect

this change returns nil while scraping the package HTML from pkg.go.dev if it is not a Go Module.

tiegz commented 5 months ago

@mikeyoung85 yeah, I wasn't sure if it was safe to merge mine with that one, but maybe we can just say "a Go Module must have the 'module' label and also have a go.mod file." Probably safe

tiegz commented 5 months ago

I've tested these changes on a sample of 1000 Go Modules from libraries, and only 36 of them were not valid (i.e. they wouldn't get ingested after my changes), which confirms more that this PR should be safe.