sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.27k forks source link

Search: Map users and code host handles #46599

Open eseliger opened 1 year ago

eseliger commented 1 year ago

For Ownership Search (file:has.owner() filter and select:file.owners) we want to map users based on the following:

This should give us a slice of ResolvedOwners for the input to the search, then we need to map the Ownership data from the CODEOWNERS to ResolvedOwners as well, for equality checks:

  1. For ingested ownership data, we simply look up User, Team by name, and user by email, all that are not found are considered invalid. No code host specific logic happening here.

  2. When parsing existing CODEOWNERS files in a repo, we will do the following to map it back to a Sourcegraph entity as a ResolvedOwner:

    • If email: Just look up the user by email. This handles the case of people having multiple email addresses as well, if they have all of them filed in their sourcegraph profile.
    • if not found, return a person result without a containing user
    • If handle:
    • Do code host specific logic
    • GitHub:
    • If the handle contains a /, it must be a team. We should then look up a team by name, where name is the part following the /. Later, we want to enrich teams by metadata about external team handles and this becomes unnecessary, where we would look up teams_external_accounts (sort of, not necessarily adding this table but you get the idea).
      • If it matches a team, we will return a Team as results for this row, for equality comparison
      • If no team is found, we will return a Person result with the full team name as the handle
    • If not, it must be a user. We then try to find the user that has the same handle, via the github auth provider. when that provider is not set up, we cannot do anything for matching here and will fall back to returning a person. We should indicate these rows in the UI ideally.
    • GitLab:
    • They don't enforce slashes for team names so we'll do the following:
      • First, see if the name matches a team, if so use that. Later, as we do for github, we will want to map external team names properly.
      • If it matches a team, we will return the Team as results for this row, for equality comparison
      • Then, if not it has to be a user or a team we don't know (which in our terms means Person for now)
      • Try to look up the user by the handle via GitLab auth provider, if no match, return a person, scoped to gitlab and the host URL
    • BitBucket:
    • TBD, don't know enough here yet
    • Do we need to extend our parser to support BitBucket group aliases first?

We will want to consider:

limitedmage commented 1 year ago

We should also reconcile this with the type:diff author:$x filter so that they are consistent with each other.

eseliger commented 1 year ago

Author can also take a human-readable name ("Erik Seliger"), so it's gonna be slightly different, but I agree we should use this enriched logic in there too for nicer mapping.

eseliger commented 1 year ago

That one can be a step 2 though, this ticket here I think we need to sort out rather quickly as it'll change pretty drastically how the ownership search works and we might want to do that before people start using it, if possible in the remaining time.

limitedmage commented 1 year ago

Agreed, consistency with author: is nice-to-have for later.

cbart commented 1 year ago

I'd still gracefully fall back to the data in the CODEOWNERS file. Why not do that?

As a customer if I had a CODEOWNERS file existing, I would enable own, and I searched by an entry in the codeowners file, got no results, I would think that the feature does not work.