zhaofengli / attic

Multi-tenant Nix Binary Cache
https://docs.attic.rs
Other
933 stars 70 forks source link

AccessError - weird race condition with auth token checks? #133

Closed srd424 closed 3 months ago

srd424 commented 3 months ago

I think I've driven myself mad with this ..

# atticadm make-token --sub stevedstarlite --validity 1y --pull "*" --push "starlite-*"
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3NDcwNzY2NzAsInN1YiI6InN0ZXZlZHN0YXJsaXRlIiwiaHR0cHM6Ly9qd3QuYXR0aWMucnMvdjEiOnsiY2FjaGVzIjp7IioiOnsiciI6MX0sInN0YXJsaXRlLSoiOnsidyI6MX19fX0.JfTlnboId_F4aeuyFRdUAgpK8vuhibWN9gu-L_vC6kY

$ attic login local http://aux-test.lan:8080 eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3NDcwNzY2NzAsInN1YiI6InN0ZXZlZHN0YXJsaXRlIiwiaHR0cHM6Ly9qd3QuYXR0aWMucnMvdjEiOnsiY2FjaGVzIjp7IioiOnsiciI6MX0sInN0YXJsaXRlLSoiOnsidyI6MX19fX0.JfTlnboId_F4aeuyFRdUAgpK8vuhibWN9gu-L_vC6kY
✍️ Overwriting server "local"

$ attic push local:starlite-nonfree $(which zerotier-one)
Error: AccessError: Access error: User does not have permission to complete this action.

Versus:

# atticadm make-token --sub stevedstarlite --validity 1y --pull "*" --push "starlite*"
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3NDcwNzY4NzcsInN1YiI6InN0ZXZlZHN0YXJsaXRlIiwiaHR0cHM6Ly9qd3QuYXR0aWMucnMvdjEiOnsiY2FjaGVzIjp7IioiOnsiciI6MX0sInN0YXJsaXRlKiI6eyJ3IjoxfX19fQ.gQRzYTrMQxAjSZfG-ZsWrWXgoUoA-EJH8Z1XFVlyEVY

$ attic login local http://aux-test.lan:8080 eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3NDcwNzY4NzcsInN1YiI6InN0ZXZlZHN0YXJsaXRlIiwiaHR0cHM6Ly9qd3QuYXR0aWMucnMvdjEiOnsiY2FjaGVzIjp7IioiOnsiciI6MX0sInN0YXJsaXRlKiI6eyJ3IjoxfX19fQ.gQRzYTrMQxAjSZfG-ZsWrWXgoUoA-EJH8Z1XFVlyEVY
✍️ Overwriting server "local"

$ attic push local:starlite-nonfree $(which zerotier-one)
✅ All done! (1 already cached, 7 in upstream)

(I promise I'll rotate my signing key now!)

srd424 commented 3 months ago

Actually it's much more confusing than that:

[nix-shell:~]$ attic login local http://aux-test.lan:8080 eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3NDcwNzk5MDIsInN1YiI6InN0ZXZlZHN0YXJsaXRlIiwiaHR0cHM6Ly9qd3QuYXR0aWMucnMvdjEiOnsiY2FjaGVzIjp7IioiOnsiciI6MX0sInN0YXJsaXRlKiI6eyJ3IjoxfX19fQ.WRMgYEHqkmBHvjvARTz-vmwTsNUTQYl9i1PnmKVMzRk
✍️ Overwriting server "local"

[nix-shell:~]$ attic push local:starlite-nonfree $(which zerotier-one)
Error: AccessError: Access error: User does not have permission to complete this action.

[nix-shell:~]$ attic push local:starlite-nonfree $(which zerotier-one)
✅ All done! (1 already cached, 7 in upstream)

(Nothing changed between the last two pushes)

Is there some sort of race / concurrency issue around authentication? I don't really understand rust, but looking at the code it looks like auth lookups are cached - maybe an issue there?

cole-h commented 3 months ago

TL;DR: Specify all permissions for the specific caches; i.e. in your first example, specify both --push 'starlite-*' AND --pull 'starlite-*'.


I just did some playing around, and this is weird behavior in the way permissions are calculated when combining multiple wild-card patterns:

When checking if a given token has permissions to act on a specific cache, we check if the cache has a direct match in the token's policies, and if so exit early and use that. If not, then we move on to checking all the wildcards, and once we find one that matches, we return those policies. (Found here: https://github.com/zhaofengli/attic/blob/4dbdbee45728d8ce5788db6461aaaa89d98081f0/token/src/lib.rs#L287-L302).

However, this can sometimes find the wrong permission, depending on how you configured the token and the order in which the caches in your token are iterated over -- I added some debugging prints and saw that sometimes the order of my token (with * having pull-only, and test-* having push-only) would see the test-* (push-only) pattern first, and sometimes see the * (pull-only) pattern first.

This is easily reproduced by creating a token that has two overlapping, but differently wild-carded, permissions. In your example, * overlaps with starlite*, so sometimes attic would see your read-only permissions and return those (which obviously don't have the ability to push) and sometimes attic would see your write-only permissions and return those (which would succeed).

I don't know if there's a general solution to this (i.e. it might actually be desirable that, if you specify group-* as read-only and group-ci-* as write-only, that token would only be able to push to group-ci-asdf but not group-important), so I think the way to fix this for your case is to add another push permission with the same name (i.e. in your first example, specify both --push 'starlite-*' AND --pull 'starlite-*').

cole-h commented 3 months ago

I opened https://github.com/zhaofengli/attic/pull/135 which should at least make it so you don't get the confusing, spurious auth successes (when they should all be auth failures).

srd424 commented 3 months ago

Ah, thank you, that makes sense! I was seriously starting to doubt my sanity. Had it been written a more old fashioned language I might have been able to pick my way through, but my middle-aged brain doesn't really grok rust (yet?)