Open ryanslade opened 2 years ago
Heads up @jplahn @dan-mckean - the "team/repo-management" label was applied to this issue.
I've added the "path" label to the existing actor type metric: https://github.com/sourcegraph/sourcegraph/pull/30302 https://github.com/sourcegraph/sourcegraph/pull/30323
You can see the resulting breakdown of missing actors here%22,%22requestId%22:%22Q-d3a36d60-b00f-4683-b87f-2085701ff97e-0A%22%7D%5D)
I've created a spreadsheet to track where these missing actors are happening. The spreadsheet gives the app
where the request is coming from and the path
that the request is being made to where the actor is missing. We will probably have to investigate each of these individually to figure out why the actors aren't being passed through and ensure that we do pass in the actor to the request. Note that two paths in particular, /git/...
and /.internal/git/...
are truncated paths since the full paths contain repo names and increased the cardinality significantly. (You'll see a bunch of these in the hours between when I added the path label and when I truncated it)
My next step here is to pick a few of the entries in the spreadsheet and start tracking them down to see what the level of effort is and if there are common patterns.
Adding the path
label for the incoming requests metric as well https://github.com/sourcegraph/sourcegraph/pull/30365
Based on looking at the grafana dashboard%22,%22requestId%22:%22Q-0e9a9aa1-020b-4358-a3db-d0c24adc7887-0A%22%7D%5D&right=%5B%22now-3h%22,%22now%22,%22Prometheus%22,%7B%22exemplar%22:true,%22expr%22:%22sum%20by(actor_type,%20app,%20path)%20(rate(src_actors_outgoing_requests%7Bactor_type%3D%5C%22none%5C%22%7D%5B5m%5D))%22,%22requestId%22:%22Q-972e0200-e0a3-4e21-9fdc-c8b7eb3bf8f0-0A%22%7D%5D) it seems that the majority of the problematic requests are coming from sourcegraph-frontend
to gitserver
, particularly the /exec
endpoint. There are a ton of places where the frontend calls gitserver, so tracking this down is going to be kind of tricky.
@ryanslade I'd love to hear some of your thoughts on this. I think we're going to need to track down each of these occurrences that I've outlined in the spreadsheet individually, which may take some time. Hopefully we can do it in parallel with the end-to-end testing. I think based on the dashboard the main offender is requests from the frontend to the /exec
endpoint of gitserver.
I think you're right that we're going to need to dig into this on a case-by-case basis.
Let's spin out what we find into new issues so that we can track them more easily.
Do we want to keep this issue open? @mollylogue
Do we want to keep this issue open? @mollylogue
Good question. I feel fairly confident that none of these missing actors will cause issues when it comes to sub-repo permissions, but we still don't know why/where exactly this is happening. Based on the metrics there are quite a few places, but we'd probably have to look into them individually to track down/remedy the situation. Thankfully if we have sub-repo enabled and a function w/ sub-repo filtering is called, we'll get an unauthenticated error, so if any issues arise they can be surfaced that way. So far we haven't found any, though.
This is a long-winded way of saying ...probably? I might want to ask @ryanslade if there's more he thinks we should look into here.
Agree that it's not super urgent but I think we should keep it open.
I wouldn't feel comfortable just ignoring the fact that we know actors are not being passed across service boundaries in all cases. It's one of those issues that I think we just have to chip away at until the metrics drop to zero.
Task: We want to add a temporary middleware to the gRPC server or the gitserver Client and log all request with no actor. Check that there's ideally none of these requests, or that those that remain are intentional.
Follow up to https://github.com/sourcegraph/sourcegraph/issues/27918
See this comment
We need to double check why some actors are coming through as "none" so that we can confidently perform sub-repo permissions.