sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.28k forks source link

Investigate missing actors #28532

Open ryanslade opened 2 years ago

ryanslade commented 2 years ago

Follow up to https://github.com/sourcegraph/sourcegraph/issues/27918

See this comment

We need to double check why some actors are coming through as "none" so that we can confidently perform sub-repo permissions.

github-actions[bot] commented 2 years ago

Heads up @jplahn @dan-mckean - the "team/repo-management" label was applied to this issue.

mollylogue commented 2 years ago

I've added the "path" label to the existing actor type metric: https://github.com/sourcegraph/sourcegraph/pull/30302 https://github.com/sourcegraph/sourcegraph/pull/30323

You can see the resulting breakdown of missing actors here%22,%22requestId%22:%22Q-d3a36d60-b00f-4683-b87f-2085701ff97e-0A%22%7D%5D)

I've created a spreadsheet to track where these missing actors are happening. The spreadsheet gives the app where the request is coming from and the path that the request is being made to where the actor is missing. We will probably have to investigate each of these individually to figure out why the actors aren't being passed through and ensure that we do pass in the actor to the request. Note that two paths in particular, /git/... and /.internal/git/... are truncated paths since the full paths contain repo names and increased the cardinality significantly. (You'll see a bunch of these in the hours between when I added the path label and when I truncated it)

My next step here is to pick a few of the entries in the spreadsheet and start tracking them down to see what the level of effort is and if there are common patterns.

mollylogue commented 2 years ago

Adding the path label for the incoming requests metric as well https://github.com/sourcegraph/sourcegraph/pull/30365

mollylogue commented 2 years ago

Based on looking at the grafana dashboard%22,%22requestId%22:%22Q-0e9a9aa1-020b-4358-a3db-d0c24adc7887-0A%22%7D%5D&right=%5B%22now-3h%22,%22now%22,%22Prometheus%22,%7B%22exemplar%22:true,%22expr%22:%22sum%20by(actor_type,%20app,%20path)%20(rate(src_actors_outgoing_requests%7Bactor_type%3D%5C%22none%5C%22%7D%5B5m%5D))%22,%22requestId%22:%22Q-972e0200-e0a3-4e21-9fdc-c8b7eb3bf8f0-0A%22%7D%5D) it seems that the majority of the problematic requests are coming from sourcegraph-frontend to gitserver, particularly the /exec endpoint. There are a ton of places where the frontend calls gitserver, so tracking this down is going to be kind of tricky.

mollylogue commented 2 years ago

@ryanslade I'd love to hear some of your thoughts on this. I think we're going to need to track down each of these occurrences that I've outlined in the spreadsheet individually, which may take some time. Hopefully we can do it in parallel with the end-to-end testing. I think based on the dashboard the main offender is requests from the frontend to the /exec endpoint of gitserver.

ryanslade commented 2 years ago

I think you're right that we're going to need to dig into this on a case-by-case basis.

Let's spin out what we find into new issues so that we can track them more easily.

jplahn commented 2 years ago

Do we want to keep this issue open? @mollylogue

mollylogue commented 2 years ago

Do we want to keep this issue open? @mollylogue

Good question. I feel fairly confident that none of these missing actors will cause issues when it comes to sub-repo permissions, but we still don't know why/where exactly this is happening. Based on the metrics there are quite a few places, but we'd probably have to look into them individually to track down/remedy the situation. Thankfully if we have sub-repo enabled and a function w/ sub-repo filtering is called, we'll get an unauthenticated error, so if any issues arise they can be surfaced that way. So far we haven't found any, though.

This is a long-winded way of saying ...probably? I might want to ask @ryanslade if there's more he thinks we should look into here.

ryanslade commented 2 years ago

Agree that it's not super urgent but I think we should keep it open.

I wouldn't feel comfortable just ignoring the fact that we know actors are not being passed across service boundaries in all cases. It's one of those issues that I think we just have to chip away at until the metrics drop to zero.

eseliger commented 8 months ago

Task: We want to add a temporary middleware to the gRPC server or the gitserver Client and log all request with no actor. Check that there's ideally none of these requests, or that those that remain are intentional.