wbond / packagecontrol.io

The Package Control website
https://packagecontrol.io
Other
111 stars 46 forks source link

Mitigate false positive :needs_review flags #141

Open kaste opened 3 years ago

kaste commented 3 years ago

This is a follow up to the discussion in discord https://discord.com/channels/280102180189634562/280157083356233728/852608042201907261 and https://discord.com/channels/280102180189634562/280157083356233728/853247224687099904.

I noticed you mark_missing() or mark_missing_by_name() immediately on the first failure. This is open for flukes when crawling.

The information you persist in these functions is

                is_missing = TRUE,
                missing_error = %s,
                needs_review = %s

Instead of setting is_missing and needs_review immediately, I suggest a new column

    missing_since            timestamp,    

The side-effect of mark_missing on first failure is just

...
SET 
    missing_since = COALESCE(missing_since, CURRENT_TIMESTAMP),
    missing_error = %s
...

You then make a new task in tasks which you add to the crontab as well which just goes over the rows and actually sets the flags is_missing and needs_review iff missing_since is older than ... whatever we think is appropriate here. Might be 15 minutes, which should lead to 3 tries approx, or even 1 hour. Something like that:

  UPDATE
  ...
  SET
    is_missing = TRUE,
    needs_review = TRUE
  WHERE
    missing_since < CURRENT_TIMESTAMP - interval '1 hour'

This already makes the is_missing flag better for the end-user as the end-user gets the offer to install packages that are temporarily down. The user might get a "404" failure within Sublime. However, such an error is easy to understand if it means just try again. On a more likely false positive, the user will just proceed.

For needs_review, you might want to complicate matters as not every error leads to a needs_review flag. (You have the needs_review(error) function in python to decide that.)

You can introduce another column, error_indicates_need_review bool, for that and use it.

...
SET 
    missing_since = COALESCE(missing_since, CURRENT_TIMESTAMP),
    missing_error = %s,
    error_indicates_need_review = error_indicates_need_review or %s
...

and

  UPDATE
  ...
  SET
    is_missing = TRUE,
    needs_review = error_indicates_need_review
  WHERE
    missing_since < CURRENT_TIMESTAMP - interval '1 hour'

Although a rough sketch, I think this makes both the is_missing flag and needs_review flag more resilient.

Does it resolve the attack vector. Kind of like before, the attacker always races with the speed of the crawler. If the attacker is fast enough, we never mark needs_review as the crawler does not notice any downtime. This does not change with the ideas presented here.

gordio commented 1 year ago

Is it better to store (and check) created_at from https://api.github.com/repos/wbond/PackageControl.io ?