wbond / packagecontrol.io

The Package Control website
https://packagecontrol.io
Other
111 stars 46 forks source link

Use the repository ID to verify whether a GitHub username was squatted #140

Open FichteFoll opened 3 years ago

FichteFoll commented 3 years ago

Instead of marking packages as "needing review" if they were unavailable once, it would be more reasonable and robust to instead check the repository id as returned by GitHub's API, store that in the database, and flag those packages as needing review that have a different ID from the latest crawl compared to the database.

wbond commented 3 years ago

Beyond this, the system will need to track and see if a third-party domain changed hands. Also, it will need to do the same sort of thing for GitLab and BitBucket.

I'm not sure if there is an automated way to see if the domain has changed hands. Maybe whois can provide the first registration date and that can be used?

FichteFoll commented 3 years ago

Another thing to consider is a repo URL being changed deliberately. I do hope that this can be checked on the database, so that an ID is only checked for the same URL. Outside of custom-hosted packages, for which we'll probably still need the "was missing" check, the implementations for the three git hosters should be very similar.

kaste commented 3 years ago

I don't think you should reach for a 100% solution here. Reducing false positives is an incremental process. Only handling GitHub reduces the stress as it's probably the 90% hoster nowadays. (And github.com can't change the owner without you reading it in the news.)

Say we just grab a uid from GitHub. In package.modify.store(values), now values will have maybe this uid (iff the provider provides it). Within store() we already cursor.fetchone() to decide if we INSERT or UPDATE. On UPDATE we can now compare the old uid with the new uid and reject some changes. Or allow these changes, but immediately mark needs_review.