open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.86k stars 2.24k forks source link

New component: Git Provider Receiver #22028

Closed adrielp closed 5 months ago

adrielp commented 1 year ago

The purpose and use-cases of the new component

The ~GitHub Metrics~ Git Provider receiver (gitproviderreceiver) is a pull based receiver to scrape metrics from GitHub leveraging the GraphQL API. Metrics like, pr mean time, branch age, number of branches, number of repos, workflow deployments, etc, are able to provide data for engineering effectiveness within organizations, aligning with DORA

Example configuration for the component

extensions:
  basicauth/github:
    client_auth:
      username: <username> #username for github
      password: <token> #token for github

receivers:
  github:
    interval: 15s #the interval at which the scraper scrapes
    github_org: <org> #the organization name in github
    auth:
      authenticator: basicauth/github #name of the authenticator as defined in extensions:

Telemetry data types supported

Metrics

Is this a vendor-specific component?

Sponsor (optional)

No response

Additional context

Will be contributing this, and proposing other receivers shortly, as OS contributions from Liatrio, INC. We are GitHub partners, but we're not contributing this as representatives of the vendor.

We already have a working receiver with many of metrics ready to go. Ready to get PR's rolling for this component (broken into multiple small PR's per the contributing documentation).

Happy to also to work through being part of the sponsorship moving forward if it'll help the project and move along the contribution.

atoulme commented 1 year ago

@adrielp will you directly be contributing this? Are you an OpenTelemetry member? Are you looking to maintain this component going forward? Is a Github representative signing off on this work?

adrielp commented 1 year ago

Thanks for the response @atoulme!

will you directly be contributing this?

Yes, and likely a few others from Liatrio, INC. If needed it can be a single contributor from Liatrio.

Are you an OpenTelemetry member?

No, happy to become one though.

Are you looking to maintain this component going forward?

Absolutely.

Is a Github representative signing off on this work?

No

atoulme commented 1 year ago

The more the merrier. Please read up here on requirements to become a member of OpenTelemetry: https://github.com/open-telemetry/community/issues/new?assignees=&labels=area%2Fgithub-membership&projects=&template=membership.md&title=REQUEST%3A+New+membership+for+%3Cyour-GH-handle%3E

Please read here on background around codeowners and membership: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/20868

You can initially make the first contrib without being a member, and become a codeowner afterwards once you have become a member.

atoulme commented 1 year ago

As a first step, please provide a metadata.yaml file with the list of resource attributes, metrics and attributes you would like to see supported by this receiver. You can link to a gist. This will help potential sponsors understand the scope of the effort. Please consider making the initial metric set small.

adrielp commented 1 year ago

Great, thank you for the information!

Here's the link to the gist of metrics we have currently built and would want to be part of the initial contribution.

metadata.yaml gist

adrielp commented 1 year ago

Last time I was reading through the contribution guidelines, I cam across the Community Contributing Guide which mentioned the CLA. After reading more about the CLA through the Linux Foundation, we, Liatrio, would like to sign as a Corporate Contributor presuming that this is still valid in the process of contributing.

jpkrohling commented 1 year ago

During the SIG Collector call today, @atoulme mentioned that this might include CI metrics. If that's the case, please engage with the following OTEP, which seeks to establish the semantic conventions for CI/CD: https://github.com/open-telemetry/oteps/pull/223

djaglowski commented 1 year ago

Is there an authentication mechanism that would allow read only access to the information needed for this receiver? I think we should not add the component unless there is no risk due to write access.

adrielp commented 1 year ago

All the code is read only. It pulls sets of data from GraphQL but does not write back. Additionally, in the collector config example, it's intended to leverage the basic auth method, inputting a GitHub token with read permissions explicitly defined. (In the docs we can certainly provide example on how to configure that token).

Generally speaking, if the repo is open to the world, then you won't need a token until such time you're rate limited by GitHub (which is a pretty low rate limit). The receiver in its current state will just throw errors until such time the rate limiting is reset.

However, for any private repo/org, a token is required and must be configured appropriately.

GitHub's documentation is basically what we use to accomplish this: https://docs.github.com/en/graphql/guides/forming-calls-with-graphql

adrielp commented 1 year ago

Thanks for the link to the OTEPs! Definitely have some thoughts and will engage with that PR through comments.

atoulme commented 1 year ago

I have done a cursory review of the metrics. I believe there is a bit of refinement needed around metric names. I would also chase resource attributes more if possible. For example, all metrics use the gh.org attribute, so it can be declared as a resource attribute instead.

The gh. prefix is not very descriptive. I think it's ok to use the vendor name, github, instead.

You don't need to prefix attributes, just give them short descriptive names. For example, organization.name instead of gh.org.

Do you need all those metrics, always? I note all metrics are enabled. Can you disable some? Can you remove some? For example, gh.repo.count could be removed if you could count the number of data points reporting for say gh.repo.contributors.count grouped by the same org name.

The ratio unit is not something we use afaict, when we count units of items, we tend to use a unit such as {user} for example.

Some food for thought.

Next step: I have presented your receiver at the SIG meeting. Maintainers and approvers will take a look. Someone might step up. Please feel free to meet with them on Slack, by commenting here, or at the next SIG meeting. One of them might sponsor this receiver.

adrielp commented 1 year ago

Awesome, thanks @atoulme !

Appreciate the feedback on the metadata.yaml and agree with your points. Will happily address and iterate on that initial example file.

I've joined in on slack and am in a couple of the otel channels. Am happy to huddle most any time!

adrielp commented 1 year ago

I've gone ahead and posted in the Slack Channel looking for a sponsor for the receiver. Hopefully that's the right channel, I was torn between that and the codeowners one.

adrielp commented 1 year ago

The metadata.yaml Gist has been updated to reflect some of those changes. Will be attending the SIG today at 9PT in hopes of finding a sponsor and to be visible.

MovieStoreGuy commented 1 year ago

I would suggest not having a precomputed aggregated values since it can cause misleading statistics once landed in vendor of choice. Keeping it a simple count of recorded value mean you can save the complicated on the vendor.

andrzej-stencel commented 1 year ago

Would the GitHub API be called in a recurring manner, using a collection_interval like other metrics scrapers? What is the realistic collection interval you would consider sensible @adrielp ?

adrielp commented 1 year ago

I would suggest not having a pre-computed aggregated values since it can cause misleading statistics once landed in vendor of choice. Keeping it a simple count of recorded value mean you can save the complicated on the vendor.

That's a really good call out @MovieStoreGuy of which makes a lot of sense and I agree with. Will update the Gist to remove the standard deviation & mean calculations.

Age by itself is a calculation. Would it be best to remove age as a value and add it as an attribute to a the branch / pr metrics, or something to that effect?

Would the GitHub API be called in a recurring manner, using a collection_interval like other metrics scrapers? What is the realistic collection interval you would consider sensible @adrielp ?

@astencel-sumo We have a home grown metric collector (python script) from a while ago that currently runs every 15 minutes.

Ideally we'd have better resolution on these metrics, especially when we start to include metrics on workflows etc. However, that is a bridge to cross for another days, particularly given the OTEP on CI/CD and how that relates.

GitHub does rate limit API calls so in the current github receiver prototype we don't use the rest API and instead use the GraphQL API. Because of how the REST API is structured, it takes many calls too get these metrics from. GraphQL lets you do it one go, reducing the number of calls and enabling the scraper to run more frequently.

Ideally, my "guess" at a sweet spot following the 80/20 rule would be to default the interval to 30 - 60 seconds. We currently have it running at 15. And of course, each deployment may choose a different time.

andrzej-stencel commented 1 year ago

I believe receiver like this would be useful. I'd be in favor of a "GitHub receiver" that can provide these metrics. Looking at your gist, I can see a lot of improvements to be made to the names of particular metrics, but these can be ironed out in the implementation. In the meantime, you might want to browse the specification.

To further understand the scope of this receiver, @adrielp would you mind sharing a GraphQL query that you have in mind that this receiver would execute, together with the parameters of the query that would be configurable? Possibly simply what that "home grown metric collector (python script)" is currently doing.

I wonder if it makes sense to craft a generic "GraphQL receiver" or an "HTTP receiver" that could be configured to run any query against any API. I think the Flex receiver proposal was close.

There's a recent discussion in the contrib community that I understand as "there are so many components in contrib already, we need to slow down with adding new components". I wonder if it affects this proposal, among others. πŸ€”

adrielp commented 1 year ago

@astencel-sumo thanks for the link to the spec. I've been down a myriad of docs and lost that one in the process. πŸ˜… Happy to make those adjustments as quickly as possible once the PR's are deemed ok to start flowing.

Just for context, the home grown metric collector (python script) actually cloned down repositories and iterated through the git history locally instead of relying on any given API. Due to that, it worked for multiple different SCM sources.

GraphQL queries are similar to REST API calls in that they're unique to the data structure. Having a generic receiver for GraphQL scrape queries would make it such that anyone could set anything and potentially open the door to security risks since anyone could define anything. The queries imo should be explicitly defined within the code.

They're not the prettiest things either from a readability perspective.

In iterative fashion, I would love to get the OTEL receiver for GitHub commited via a PR as we already have the structure & have it running.

We could maintain our own distribution as a vendor (the SIG conversations about vendor distros & OCB were interesting - especially since we, and our clients, really love OCB as a tool and the way things are structured here) but we'd much rather contribute to this project directly as maintainers of the receiver instead of just rolling our own distro. While it's cool to be able to roll your own distro, I find it can cause issues. Just in the example of ADOT, the ADOT collector works with XRay tracing out of the box, but the exporter for XRay in this contrib repo has issues on converting the traces. ( we're going to open an issue about this once we find out the specific difference) That makes the XRay exporter not really usable from this repo, enforcing the use of ADOT even though ADOT is roughly 6 months behind in terms of feature sets.

An example query structure for getting the branch information from a given repo is as follows:

    var query struct {
        Repository struct {
            Refs struct {
                TotalCount int
                Nodes      []struct {
                    Name   string
                    Target struct {
                        Commit struct {
                            History struct {
                                TotalCount int
                                Edges      []struct {
                                    Node struct {
                                        CommittedDate string
                                    }
                                }
                                PageInfo struct {
                                    EndCursor string
                                }
                            }
                        } `graphql:"... on Commit"`
                    }
                }
            } `graphql:"refs(refPrefix: \"refs/heads/\", first: 100)"`
        } `graphql:"repository(name: $repoName, owner: $owner)"`
    }

There's a https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/21494 in the contrib community that I understand as "there are so many components in contrib already, we need to slow down with adding new components". I wonder if it affects this proposal, among others. πŸ€”

I really hope not. Again, if necessary we could do our own distro ( I don't want to though, would much rather be closer to contributing to the community than farther away). We have multiple receivers ready to go from Team Effectiveness style metrics + security focused metrics for various tooling that we use. For example, we in short order will be opening an issue for a GitLab receiver as well for similar metrics. Want to get this one out of the way and contributed first so that I can get to the point of providing sponsorship for others to contribute those receivers to reduce overhead on the OTEL team (presuming that would be ok with y'all once at that point).

MovieStoreGuy commented 1 year ago

There's a https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/21494 in the contrib community that I understand as "there are so many components in contrib already, we need to slow down with adding new components". I wonder if it affects this proposal, among others.

My thoughts on this is no, the problems being conveyed from the issue is that we need more people to help maintain the project. We are doing our best as a collective group but it doesn't mean that there is some days that shit hits the fan because someone deleted a git tag we depended on. What it means for this PR is that we would really appreciate a gauge on how much effort you could provide to help maintain your component or see if you can gather interest from Github's engineering group to take on.

CI/CD is close to my heart as it where I started in Atlassian so I greatly appreciate pushing the conversation but there is a conflict of interest for me to sponsor this addition.

andrzej-stencel commented 1 year ago

Thanks a lot @adrielp, your comment is very informative and helpful, giving a lot of context.

I especially appreciate that you shared it's not only about GitHub - that was my thought when I first read this proposal. We could imagine other Git/SCM being sources of such "repository" metrics like GitLab, BitBucket, etc. This in turn means perhaps we should consider more generic names to the metrics - instead of github.repositories, why not git.repositories or source_control.repositories?

I'm in favor of being flexible. Let's start with what we have now (adjusting to the semantic conventions of course), mark the new component as "alpha"/"experimental", and move forward, potentially introducing breaking changes like renaming metrics etc.

adrielp commented 1 year ago

What it means for this PR is that we would really appreciate a gauge on how much effort you could provide to help maintain your component or see if you can gather interest from Github's engineering group to take on.

Good to know @MovieStoreGuy . Our intent is to continually provide support to maintain this component, and others we contribute. I'm kicking off Liatrio's contributions to OTEL, and our plan is to have multiple people from Liatrio start to contribute to OTEL. We're already working on these things internally kicking off an initiative.

This in turn means perhaps we should consider more generic names to the metrics - instead of github.repositories, why not git.repositories or source_control.repositories?

That's a really good call out @astencel-sumo . That comment stimulated an idea and some questions. Is having one receiver with multiple sub-packages for communicating to different vendors a pattern y'all find valid? Theoretically ( i haven't dug in) we could just make a gitreceiver that starts with github but adds gitlab, bitbucket, etc later on with metrics all defined as git.<> for consistency but have attributes that define the platform they're from etc.

My two concerns there are:

I'm in favor of being flexible. Let's start with what we have now (adjusting to the semantic conventions of course), mark the new component as "alpha"/"experimental", and move forward, potentially introducing breaking changes like renaming metrics etc.

awesome, sounds good. Huge fan of iteration. So am i understanding right that basically you're willing to sponsor and I can go ahead and get those pr's rolling ready to make changes, improvements, meet conventions, etc? πŸ˜„

andrzej-stencel commented 1 year ago

Is having one receiver with multiple sub-packages for communicating to different vendors a pattern y'all find valid?

I think it all depends. :sweat_smile: For example, in case of database health metrics, there are currently separate receivers for each database engine: MySQL, Oracle, PostgreSQL, etc. Part of it is probably that they all report different sets of metrics. If (if!) we were able to determine a "standard" set of metrics (I'm thinking of scoping them to "repository health metrics") that can be obtained in the same form from different Git vendors, I think it would make sense to keep those different vendor implementations in a single "Git repository health receiver".

in the case of multiple receivers, if all git metrics were defined as git.<> for all platforms would they cause conflict? They are the same metrics so the thought does make sense imo.

I don't think that would cause a conflict of any kind. The metrics reported by the various receivers would have the same names, but different attributes. I don't think it's incorrect for different receivers to produce the same metric names.

if we were to have a gitreceiver with sub packages for the various implementations of vendors, would we be able to run that singular receiver in the context of one collector but provide multiple auth mechanisms?

Do we need to authenticate via an extension? The receiver could have separate config section for each vendor:

receivers:
  gitrepohealth:
    github:
      username:
      token:
    gitlab:
      ...

Or perhaps it might be possible to configure a different auth extension for each vendor? :thinking: Would need to verify if this would work.

extensions:
  basicauth/github:
    ...
  basicauth/gitlab:
    ...
receivers:
  gitrepohealth:
    github:
      auth: basicauth/github
    gitlab:
      auth: basicauth/gitlab

So am i understanding right that basically you're willing to sponsor

Yes, I'm willing to sponsor a receiver that will retrieve "Git repository health" metrics. If there's no opposition to this, let's kick off the implementation and see where it leads us.

For the initial PR, I propose that you produce the skeleton of the receiver with minimal, if any, implementation of the actual functionality - we want to keep the PR size manageble and for this, we want to do things iteratively. :muscle: :slightly_smiling_face:

adrielp commented 1 year ago

Yes, I'm willing to sponsor a receiver that will retrieve "Git repository health" metrics. If there's no opposition to this, let's kick off the implementation and see where it leads us.

Awesome, thanks @astencel-sumo!! Looking forward to it!

Appreciate the information here. After some thinking on this, and investigation, I do believe it's possible to achieve.

I've started converting the working githubmetricsreceiver we had to a generic gitmetricsreceiver where each vendor is implemented as its own scraper, calling the same metrics builder as part of the default config. Each scraper will have the capability of defining its own interval, and httpclient settings through separate passed in extensions. I think this "should" work as it's similar in part to the hostmetricsreceiver implementation. Need to of course try it, but I'm hoping to have the skeleton up tomorrow for review.

With this pattern, the two things I'm currently thinking about are:

adrielp commented 1 year ago

Updated the component name to reflect the change from github metrics to git metrics based on the feedback.

andrzej-stencel commented 1 year ago

I'll assign this issue to you @adrielp if that's OK, because if I understand correctly, you're going to create a PR, right?

adrielp commented 1 year ago

Yup, that's correct. Thanks @astencel-sumo

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

adrielp commented 12 months ago

This issue is still active. Development has been ongoing in the upstream liatrio-otel-collector. There have been some performance/graphql intricacies so we've been prototyping & improving very quickly there before getting the scraper integrated into contrib. The skeleton of the receiver is in however, so will be contributing the next phase of PR's soon.

github-actions[bot] commented 10 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] commented 8 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

adrielp commented 8 months ago

This is still open & having contributions come through. Receiver is in development.

github-actions[bot] commented 7 months ago

Pinging code owners for receiver/gitprovider: @adrielp @astencel-sumo. See Adding Labels via Comments if you do not have permissions to add labels yourself.

atoulme commented 5 months ago

I believe this new component has now landed, closing as complete. Please reopen otherwise.