microsoftgraph / msgraph-sdk-go

Microsoft Graph SDK for Go
https://docs.microsoft.com/en-us/graph/sdks/sdks-overview
MIT License
232 stars 34 forks source link

Issue with the size of the API surface of the models package #129

Closed breml closed 1 year ago

breml commented 2 years ago

The way, this Go module and especially the models package are constructed, leads to severe issues with the go tooling (and linting).

We have a simple Go application where we added the functionality to send email notifications though the Microsoft Graph API using Mail.Send. So we added the necessary packages, namely:

import (
    "github.com/Azure/azure-sdk-for-go/sdk/azidentity"
    authentication "github.com/microsoft/kiota-authentication-azure-go"
    msgraphsdk "github.com/microsoftgraph/msgraph-sdk-go"
    "github.com/microsoftgraph/msgraph-sdk-go/me/sendmail"
    "github.com/microsoftgraph/msgraph-sdk-go/models"
)

for this purpose.

The final code did what was expected (sending the mail), but it also made the duration for go test to jump from 7s to 8m 50s and the linting with golangci-lint to jump from 36s to 7m 13s in our Github Action pipeline.

We then changed the Graph API request to a "plain HTTP Post request" using the net/http package from the standard library (while still using github.com/Azure/azure-sdk-for-go/sdk/azidentity and github.com/microsoft/kiota-authentication-azure-go for the authentication) and we are back at the normal times for tests and linting.

Additional pointers for the excessive size of the github.com/microsoftgraph/msgraph-sdk-go/models package are the official documentation refuses to display the API because it is too large and also Github only shows the first 1000 files when listing the content of said package folder.

So in my opinion, the API surface of the package github.com/microsoftgraph/msgraph-sdk-go/models is way to broad and this should be refactored massively.

baywet commented 2 years ago

Hi @breml Thanks for trying the Go SDK and for the detailed feedback.

This size and tooling performance aspect is something we're aware of and working to improve. I'll share a lengthy reply about the context and our efforts here.

Context

This SDK is automatically generated from metadata using kiota. The original metadata is a CSDL populated by all the service teams that live under Microsoft Graph (one for v1, one for beta). The CSDL eventually gets converted to an OpenAPI description which is quite large (1k+ endpoints, 1.5k models....). Because of the size of the API, it would not be feasible to handcraft the SDK for the full API surface.

We've thought for while about "slicing" the SDK in multiple submodules (one for files, one for emails etc...) to make it easier for people to get only the thing they care about. In fact this is what we've done with PowerShell. But due to the nature of a "graph" (all the models are somewhat related to one another) and the diversity of applications that get built, the slicing is never "right" for anyone (either too large, or too small, or leads to duplication of models...)

For those reasons we've decided to ship "full SDKs" as one of our offerings. It might not be ideal for everybody, but we feel like there's a population of Go developers that "just want and SDK to build applications" (as opposed to the second option I'll describe below).

Go shortcomings

Through my explorations of Go, I've noticed a few shortcomings. At this point I'm not sure whether this is because our project/packages are not setup properly or because of limitations for Go and large projects:

  1. go build often rebuilds sub packages that haven't changed and for which dependencies have not changed.
  2. go test often rebuilds things, even when go build was run right before. Why is it not relying on some sort of cache?
  3. same issue with go lint.

To me it feels like the cost of building a large project with lots of sub-packages should only be paid "once per session" unless dependencies are updated, cache is evicted or code changes.

Please feel free to outline any optimizations in our project configuration/structure that could help with that front. And if there are ways to engage with the Go community (people building the go compiler etc...) to provide that kind of feedback, I'll be more than happy to spend time doing so. I know our project is kind of an odd duck compared to most go packages out there due to its size.

Improvements to come

We already have two improvements queued up to reduce the amount of code we generate at the first place:

  1. Move to go 1.18 and leverage generics to remove a lot of for loops used to recast slices
  2. Remove redundant nil/error checks in the generated code

Please go ahead and review those, and if you find other patterns in the generated code that are redundant/could be replaced by something smaller, don't hesitate to mention it on those issues.

Right size sdks

Lastly, we recognize that having a full SDK with all the endpoints is not going to be suitable for everybody for a variety of reasons. We're working to enable a new "right-sized self-service SDK experience" where any API user could generate an SDK that looks and feels just like this one, but with only the endpoints/models they care about for their application instead of the full API surface. We're really early in those efforts right now, but happy to get feedback on those nonetheless. Here is what the procedure would roughly look like:

  1. Create a new go project/identify an existing project.
  2. Add the kiota dependencies or msgraph-sdk-go-core (which pulls the Kiota dependencies and adds a few additional things)
  3. Go to Graph explorer select the resources you need (left panel, second tab, ..., "add to collection").
  4. Click on preview collection and export it as a postman collection.
  5. Use hidi with the postman collection and the full OpenAPI description I shared earlier to generate a "filtered" OpenAPI description.
  6. Use Kiota to generate your Go client for Microsoft Graph in your project.
  7. Start calling the API.

At this point we're working to document all those steps, and streamline them (maybe compressing steps 4-5). The great thing about this approach is that steps 5 through 7 can work with any OpenAPI described API you want to call, not just Microsoft Graph. Again, this last offering is still in its infancy, feel free to try it out and provide feedback at the different places.

I hope this lengthy post clarifies where we're heading and getting some more feedback from the Go community on all those aspects would really help!

breml commented 2 years ago

Hi @baywet

Thanks for your detailed feedback.

I can not go into all the details you mentioned due to lack of time, but I still want to share some of my thoughts with you:

Finally, I would be happy to help you with the improvement of the Go SDK, but this is not something, that I can do in my spare time nor as part of my current day job.

baywet commented 2 years ago

does not render the API documentation

Interesting, I might reach out to the team to understand what's going on, the documentation for the root package seems to be working properly https://pkg.go.dev/github.com/microsoftgraph/msgraph-sdk-go#GraphServiceClient.Admin

I recommend to file an issue on

Thanks for sharing this! I want us to do our homework (improvements I outlined) before we share our issues with that audience.

the times I shared are from my Github Actions pipeline

Understood, we've noticed the behaviour in both situations (GH actions and local dev loop).

that the models package has such a huge amount of types (even though only a small fraction of it is used

Shouldn't the linter only look at your code and not dependent packages? Additionally, shouldn't the compiler treeshake/trim unused types?

It is my feeling, that this is more rooted in the number of types than in some redundant code

Noted, reducing the amount of code we're generating shouldn't hurt anyway. As per types we have 2 types per model (i.e. User struct and Userable interface). That's to "support upcast". Eg. when querying /groups/{id}/members the endpoint returns directory objects, which are in fact users, service principals, and other types that derive from directory object. Because the SDK automatically deserializes to the most specialized type, we've had to introduce interfaces. The technical reason behind it being we can't upcast structs (i.e. DirectoryObject(myUserInstance) or myUserInstancePtr.(*DirectoryObject)) because they are not the same size in memory. If we had a way to upcast structs we could divide the number of types by two, which should improve build times.

I can not remember anyone doing such an approach

How do Go devs typically handle GraphQL APIs then?

The usage of the whole API does not feel very Go-ish

Do you have specifics about patterns that don't feel idiomatic and what API we could offer instead?

I would be happy to help you with the improvement of the Go SDK, but this is not something, that I can do in my spare time nor as part of my current day job.

To be specific I'm not asking for you to contribute (PRs), but to keep providing feedback like you have so far. This is instrumental to us correcting course before the general availability of the language, thanks a lot for the feedback you've already provided!

breml commented 2 years ago

does not render the API documentation

Interesting, I might reach out to the team to understand what's going on, the documentation for the root package seems to be working properly https://pkg.go.dev/github.com/microsoftgraph/msgraph-sdk-go#GraphServiceClient.Admin

My assumption is, that there is some sort of hard limit on https://pkg.go.dev, that prevents the documentation from rendering. Some quick stats:

A quick search showed me https://github.com/golang/pkgsite/blob/master/internal/godoc/dochtml/dochtml.go#L92-L95, which defines a limit of 10 MB for the resulting documentation.

Locally it is possible to render the documentation (with the official godoc tool), and the resulting file is ~24 MB. That being said, this (locally generated) documentation is also rather useless, the index alone is ~450 full screen browser pages long.

I recommend to file an issue on

Thanks for sharing this! I want us to do our homework (improvements I outlined) before we share our issues with that audience.

As you wish. In order to just get some pointers about where to look, I would still recommend to reach out to the #tools channel on Gopher slack. In this channel, all topics around tooling around go (official but also community maintained like golangci-lint) are discussed there and some of the Go maintainers are hanging out there as well and sharing insights.

the times I shared are from my Github Actions pipeline

Understood, we've noticed the behaviour in both situations (GH actions and local dev loop).

that the models package has such a huge amount of types (even though only a small fraction of it is used

Shouldn't the linter only look at your code and not dependent packages? Additionally, shouldn't the compiler treeshake/trim unused types?

I am not entirely sure about the internal working of these tools, but almost all of them depend on https://pkg.go.dev/golang.org/x/tools/go/analysis for the analysis of the Go code. Additionally also gopls, the official Go language server, depends on the above mentioned package for all the Go AST and types processing. What this means is, that these packages are really widely used and they are maintained by the Go core team. So I expect them to be of very high quality and optimized for performance.

I do not know, if and to what extend treeshake or trim is used in this process and if these techniques could speed up the process. But because Go is a statically typed language, I expect that during the process of linting and compiling, all the types must be known (not only of the code that is linted, but also from all its dependencies).

It is my feeling, that this is more rooted in the number of types than in some redundant code

Noted, reducing the amount of code we're generating shouldn't hurt anyway. As per types we have 2 types per model (i.e. User struct and Userable interface). That's to "support upcast". Eg. when querying /groups/{id}/members the endpoint returns directory objects, which are in fact users, service principals, and other types that derive from directory object. Because the SDK automatically deserializes to the most specialized type, we've had to introduce interfaces. The technical reason behind it being we can't upcast structs (i.e. DirectoryObject(myUserInstance) or myUserInstancePtr.(*DirectoryObject)) because they are not the same size in memory. If we had a way to upcast structs we could divide the number of types by two, which should improve build times.

I can not remember anyone doing such an approach

How do Go devs typically handle GraphQL APIs then?

I am not a frequent user of GraphQL API. I am aware of the fact, that the dynamic nature of GraphQL can make it tricky with Go. One solution, that I think is worth looking at is: https://github.com/Khan/genqlient

That being said, the Microsoft Graph, at least in my understanding, is not a GraphQL API but a highly inter-weaved REST API.

The usage of the whole API does not feel very Go-ish

Do you have specifics about patterns that don't feel idiomatic and what API we could offer instead?

This would take some more time, maybe I can followup on this during the week.

I would be happy to help you with the improvement of the Go SDK, but this is not something, that I can do in my spare time nor as part of my current day job.

To be specific I'm not asking for you to contribute (PRs), but to keep providing feedback like you have so far. This is instrumental to us correcting course before the general availability of the language, thanks a lot for the feedback you've already provided!

mblaschke commented 2 years ago

just tried a go get -u (via zsh with enabled REPORTTIME):

go: downloading github.com/prometheus/client_golang v1.12.2
go: downloading github.com/Azure/go-autorest/autorest v0.11.27
go: downloading github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.0.0
go: downloading gopkg.in/yaml.v3 v3.0.0-20220512140231-539c8e751b99
go: downloading github.com/Azure/azure-sdk-for-go v64.1.0+incompatible
go: downloading github.com/microsoft/kiota/serialization/go/json v0.0.0-20220331211134-ada6b745f15a
go: downloading github.com/microsoftgraph/msgraph-sdk-go v0.24.0
go: downloading github.com/microsoft/kiota v0.1.3
go: downloading golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a
go: downloading golang.org/x/crypto v0.0.0-20220518034528-6f7dac969898
go: downloading github.com/spf13/cast v1.5.0
go: downloading github.com/Azure/go-autorest/autorest/adal v0.9.20
go: downloading github.com/prometheus/common v0.34.0
go: downloading github.com/Azure/azure-sdk-for-go/sdk/azcore v1.0.0
go: downloading github.com/golang-jwt/jwt/v4 v4.4.1
go: downloading github.com/microsoftgraph/msgraph-sdk-go-core v0.25.0
go: downloading github.com/yosida95/uritemplate/v3 v3.0.2
go: downloading github.com/microsoft/kiota/http/go/nethttp v0.0.0-20220331211134-ada6b745f15a
go: downloading github.com/Azure/azure-sdk-for-go/sdk/internal v1.0.0
go: downloading github.com/AzureAD/microsoft-authentication-library-for-go v0.5.0
go: downloading golang.org/x/net v0.0.0-20220520000938-2e3eb7b945c2
go: downloading github.com/microsoft/kiota-abstractions-go v0.7.0
go: downloading github.com/microsoft/kiota-serialization-json-go v0.4.0
go: downloading github.com/microsoft/kiota-serialization-text-go v0.2.0
go: downloading github.com/microsoft/kiota-http-go v0.4.0
go: downloading github.com/microsoft/kiota-serialization-text-go v0.3.0
go: downloading github.com/microsoft/kiota-http-go v0.4.1
go: downloading github.com/kr/pretty v0.3.0
go: upgraded github.com/Azure/azure-sdk-for-go v63.1.0+incompatible => v64.1.0+incompatible
go: upgraded github.com/Azure/azure-sdk-for-go/sdk/azcore v0.21.0 => v1.0.0
go: upgraded github.com/Azure/azure-sdk-for-go/sdk/azidentity v0.13.2 => v1.0.0
go: upgraded github.com/Azure/azure-sdk-for-go/sdk/internal v0.9.1 => v1.0.0
go: upgraded github.com/Azure/go-autorest/autorest v0.11.24 => v0.11.27
go: upgraded github.com/Azure/go-autorest/autorest/adal v0.9.18 => v0.9.20
go: upgraded github.com/AzureAD/microsoft-authentication-library-for-go v0.4.0 => v0.5.0
go: upgraded github.com/cjlapao/common-go v0.0.18 => v0.0.19
go: upgraded github.com/golang-jwt/jwt v3.2.1+incompatible => v3.2.2+incompatible
go: upgraded github.com/golang-jwt/jwt/v4 v4.2.0 => v4.4.1
go: added github.com/microsoft/kiota-abstractions-go v0.7.0
go: added github.com/microsoft/kiota-http-go v0.4.1
go: added github.com/microsoft/kiota-serialization-json-go v0.4.0
go: added github.com/microsoft/kiota-serialization-text-go v0.3.0
go: upgraded github.com/microsoft/kiota/abstractions/go v0.0.0-20220309144454-31e5897b295c => v0.0.0-20220331211134-ada6b745f15a
go: upgraded github.com/microsoft/kiota/authentication/go/azure v0.0.0-20220309144454-31e5897b295c => v0.0.0-20220331211134-ada6b745f15a
go: upgraded github.com/microsoft/kiota/http/go/nethttp v0.0.0-20220308162731-fb6ab0cd5ea2 => v0.0.0-20220331211134-ada6b745f15a
go: upgraded github.com/microsoft/kiota/serialization/go/json v0.0.0-20220308162731-fb6ab0cd5ea2 => v0.0.0-20220331211134-ada6b745f15a
go: upgraded github.com/microsoftgraph/msgraph-sdk-go v0.13.0 => v0.24.0
go: upgraded github.com/microsoftgraph/msgraph-sdk-go-core v0.0.14 => v0.25.0
go: upgraded github.com/pkg/browser v0.0.0-20210115035449-ce105d075bb4 => v0.0.0-20210911075715-681adbf594b8
go: upgraded github.com/prometheus/client_golang v1.12.1 => v1.12.2
go: upgraded github.com/prometheus/common v0.32.1 => v0.34.0
go: upgraded github.com/spf13/cast v1.4.1 => v1.5.0
go: upgraded github.com/yosida95/uritemplate/v3 v3.0.1 => v3.0.2
go: upgraded golang.org/x/crypto v0.0.0-20220411220226-7b82a4e95df4 => v0.0.0-20220518034528-6f7dac969898
go: upgraded golang.org/x/net v0.0.0-20220127200216-cd36cc0744dd => v0.0.0-20220520000938-2e3eb7b945c2
go: upgraded golang.org/x/sys v0.0.0-20220319134239-a9b59b0215f8 => v0.0.0-20220520151302-bc2c85ada10a
go: upgraded google.golang.org/protobuf v1.27.1 => v1.28.0
go: upgraded gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b => v3.0.0-20220512140231-539c8e751b99
go get -u  26.55s user 31.55s system 3% cpu 30:01.08 total

30 minutes on a 8 core Intel Core i9 with 16 GB RAM 😳

docker multiarch builds (arm, arm64, amd64) takes around 2:30:00 on GitHub and is nearly reaching the timeout limit.

mblaschke commented 2 years ago

With v0.24.0 build time reaches 4 hours or more (maximum seen: 5 hours)

baywet commented 2 years ago

Again, at this point we're really focusing on API coverage and reliability more than build time performance due to a shortage of staff. If you're watching the repo, you'll see that we get more issues about missing API segments or serialization than anything else for the time being.

That being said, I've exposed our short-terms plans on this thread (use of generics to reduce the amount of code we generate) and long-terms plans (ability to generate your own right sized, subset SDK), both of which are still on the roadmap before GA.

Now, I'm trying to understand why you're running go get -u in your CI? And why do you have to run it for every architecture instead of once for all the architectures, and then a go build? FYI the build of the SDK takes about ~7 minutes with the GitHub actions agent

For more insights on our plans, you can watch the brand new Microsoft Build recordings for free Deep dive into Microsoft Graph SDKs

dominikh commented 2 years ago

I haven't given this a deeper look, so these are drive-by comments, but I wanted to provide some general clarifications.

  • go build often rebuilds sub packages that haven't changed and for which dependencies have not changed.
  • go test often rebuilds things, even when go build was run right before. Why is it not relying on some sort of cache?
  • same issue with go lint.

These are largely not true. go build and go test use content-addressed caches (see the contents of go env GOCACHE) that take into consideration the source code of the package, its dependencies, and the relevant build environment. If they rebuild something, then something ought to have changed.

go test often rebuilds things, even when go build was run right before

At a minimum, go test will have to rebuild the package that is being tested, to include *_test.go files in the build. Conceptually, the package and the package while being tested are two different packages, made up of different files.

same issue with go lint

I'll assume you mean go vet here. Vet does reuse work made by go build and go test, namely it uses the export data that was cached. Of course this only helps with dependencies; the actual package being vetted needs to be loaded from source, to be analyzed. Vet also caches facts it has computed for API in dependencies.

All that is to say that if you see seemingly spurious rebuilds, it is worth investigating why these rebuilds are happening; they're not a fundamental limitation of the Go build system.

baywet commented 2 years ago

Thanks for chiming in @dominikh !

As mentioned previously I'd like for us to investigate the performance improvements first (amount of code we generate, generics...) before engaging with the Go "core" community. That being said, if you notice something that's obviously wrong in our repos, don't hesitate to point it out.

I'm often working with "replace" (local links to the dependency tree) and it seems that any change in the package will "mark it as stale" and rebuild the whole tree of dependent packages, which might have impacted my perception. (other ecosystems have a more fine grained tracking of those things, and are able to track these things at a class/type level). Would this make sense?

Also go get -u seems to trigger multiple rebuilds for each package it updates instead of updating all the packages and doing one single rebuild, would that be possible?

dominikh commented 2 years ago

I also can't reproduce the extreme build times reported in this issue.

30 minutes on a 8 core Intel Core i9 with 16 GB RAM

With v0.24.0 build time reaches 4 hours or more (maximum seen: 5 hours)

I've tested a fresh build of this repository on a 16 core 3950X and got this:

$ git clone https://github.com/microsoftgraph/msgraph-sdk-go
$ cd msgraph-sdk-go
$ time go build -v
go build -v  796.29s user 151.10s system 2172% cpu 43.614 total

And while 43.6 seconds isn't great, it's a far cry from 30 minutes, and especially from 4 hours. Even if you build for multiple GOOS x GOARCH combinations, it would take way too many combinations to reach these times.

dominikh commented 2 years ago

I'm often working with "replace" (local links to the dependency tree) and it seems that any change in the package will "mark it as stale" and rebuild the whole tree of dependent packages, which might have impacted my perception. (other ecosystems have a more fine grained tracking of those things, and are able to track these things at a class/type level). Would this make sense?

Yes, Go's caching is on a per-package level, not per-class/type. If you change anything about a package, then that invalidates the whole package, and all packages that transitively depend on it.

Also go get -u seems to trigger multiple rebuilds for each package it updates instead of updating all the packages and doing one single rebuild, would that be possible?

The compiler in general is modular, compiling one package at a time, and using cached data to reuse the work done for dependencies. There is no practical difference between building each package and building all packages.

baywet commented 2 years ago

Thanks for the additional information. In a scenario where we're building a console app which depends on this library (no replace, most basic case), should the cost of building this package one be paid once assuming dependencies are not changing? In other words, if I setup my project (init & go get this library), shouldn't the subsequent go build commands only take a few milliseconds because only building the app.go and retrieving everything else from the cache? The reason I'm asking the question is because it's not the behaviour I'm seeing locally (or that the rest of my team is seeing) and I'm worried something is not configured properly.

breml commented 2 years ago

@baywet I can confirm, that I observed the same as you are describing and this worries me as well. I had a small conversation on Gopher Slack about this issue (this is why @dominikh chimed in).

In this conversation @josharian suggested to file an issue with Go to inspect this behavior in more depth, citation:

Years ago, when I last paid close attention, the go-vim types package used to dominate the compile juju benchmark. It had a massive exported surface area and was also a bottleneck in the compilation graph. The export data has been overhauled since then, but it is easy to imagine new bugs have been introduced, particularly with the IR generics work. Definitely worth filing a Go issue about (golang.org/issue/new). At least in the days when I was contributing, multi-minute build times were a top priority to fix.

Therefore, as mentioned before, I suggest to get in touch with the Go maintainers by filling an issue to

  1. confirm / rule out the Go compiler / tooling / analysis package
  2. get more hints on where to focus on in regards to the planned refactoring / optimizations.
baywet commented 2 years ago

Thanks for the suggestion. From a Go community etiquette perspective, do you think it'd be ok to file that issue BEFORE we investigate the use of generics (to reduce the amount of generate code) on our end?

breml commented 2 years ago

@baywet Yes, I am pretty sure, that the Go team would like to understand why we see these kind of build times regardless of the fact, if the code uses generics or not.

As you might know, generics only landed a few months ago and they are not expected to be completely optimized in regards to performance. Due to the lack of generics in the past, there is lots of automatically generated code in the Go ecosystem so this is not something unusual. And the overwhelming part of the existing Go code is still without generics.

In fact, if this issue gets investigated by the Go team, I would even expect them to be interested if a refactoring towards generics improves the situation.

josharian commented 2 years ago

From a Go community etiquette perspective, do you think it'd be ok to file that issue BEFORE we investigate the use of generics

Yes!

Feel free to tell them that I sent you. :)

An excellent issue would go something like:

cmd/compile: slow compilation of a package with a very large API

We maintain a package with a very large API surface area that is very slow to compile. We have some ideas for how to streamline the package using generics, but @josharian also requested that we file an issue about the existing compilation time as well, in case there are any improvements available that might benefit others. To reproduce: insert precise reproduction instructions, including exact git remotes and hashes and exact go commands run and the timing you see for them.

dominikh commented 2 years ago

Thanks for the additional information. In a scenario where we're building a console app which depends on this library (no replace, most basic case), should the cost of building this package one be paid once assuming dependencies are not changing? In other words, if I setup my project (init & go get this library), shouldn't the subsequent go build commands only take a few milliseconds because only building the app.go and retrieving everything else from the cache? The reason I'm asking the question is because it's not the behaviour I'm seeing locally (or that the rest of my team is seeing) and I'm worried something is not configured properly.

That's roughly what I would expect, yes. There is of course still a cost to using such a large package, even if it's cached. The cached export data needs to be loaded and used, the final binary needs to be linked, etc. But one would expect these costs to be much smaller than the cost of compiling the package the first time.

So far I've only tested compiling the package itself, not compiling a dependent. It would be interesting to compare the two.

josharian commented 2 years ago

The cached export data needs to be loaded and used, the final binary needs to be linked, etc. But one would expect these costs to be much smaller than the cost of compiling the package the first time.

If the code is minimal, loading import data could in theory end up being just as costly as doing the initial compilation. Particularly if you have to load the import data N times, and compile only once.

The compiler importer has gone through roughly three generations: (1) Simple, text-based, and expensive, (2) cheaper binary but still non-lazy, and (3) binary and lazy. With the lazy importer, in theory, the compilation cost of importing a very large package ought to be proportional to the symbols you use from that package. But maybe that's not the case. (Maybe the compiler is reading the entire import data instead of mmap'ing it? Maybe the index of symbols itself is large enough to impact compilation time?) That's the sort of interesting, fixable thing one can learn from cases like this, and why it's worth filing an issue, if you can reliably reproduce and nail down what is slow (without worrying about exactly why).

baywet commented 2 years ago

Thanks everyone! I posted an issue with a clear repro don't hesitate to jump on there and ask or provide additional information if you think I missed something.

mblaschke commented 2 years ago

i was only getting 30 min for a go get -u on my macbook. i'm not using it in my pipelines. With the go get -u issue I wanted to say that it's not only a build issue but also a development issue when using this SDK.

@dominikh

i'm building a small application with azure-sdk-for-go and msgraph-sdk: https://github.com/webdevops/azure-auditor

with msgraph-sdk:

$ time go build -a
go build -a  1597.12s user 730.81s system 883% cpu 4:23.41 total

$ time make gosec
[...]
Summary:
  Gosec  : 2.11.0
  Files  : 22
  Lines  : 2966
  Nosec  : 7
  Issues : 0

make gosec  25.05s user 24.21s system 488% cpu 10.086 total

without msgraph-sdk:

$ time go build -a
go build -a  102.17s user 22.48s system 552% cpu 22.580 total

$ gosec ./...
[...]
Summary:
  Gosec  : 2.11.0
  Files  : 21
  Lines  : 2764
  Nosec  : 7
  Issues : 0

make gosec  3.47s user 3.77s system 648% cpu 1.116 total

on a MacBook Pro 16" (2021), 2,3 GHz 8-Core Intel Core i9, 32 GB RAM

i get 4-5 hours on docker multiarch builds on github actions using public agents: i'm aware that docker multiarch builds uses qemu which is horrible slow but 4 hours is nothing i would expect image

jackhopner commented 2 years ago

I've also noticed VSCode becoming very slow (save, autocomplete mostly) after installing the SDK. From what I can tell gopls is the culprit but my suspicion is that gopls is having a hard time searching/indexing this SDK because of the size.

Would putting all the models in a single module and splitting each namespace (users, devices, chats, drive etc) into its own module as well? This would allow us to only pull the namespace(s) we need while still allowing all the models to use each other.

baywet commented 2 years ago

Hey everyone, The issue that I created over 6 weeks ago hasn't gotten any traction. Do you have any suggestion on how we could get somebody to look at it? Thanks!

breml commented 2 years ago

@baywet I agree, it is very unfortunate, that there has not been any response so far on your issue. The only thing that happened, was the assignment of two labels. The reasons can be manifold, e.g. the next release of Go is scheduled for August 2022 and when the issue was raised, the tasks, that would go into this release might already have been decided. Also the closer the next release is, the more the focus is on stabilization.

One thing that can be done is to mention the owners of the respective part directly in a comment of the issue to get their attention. Even though we have a clear way to reproduce the issue, it is not yet clear, where the cause for the issue is. Judging from https://dev.golang.org/owners (I am not completely sure how recent this list is), I can see multiple candidates:

Maybe this boils down to mention rsc directly, but I don't know if this is appropriate.

Maybe @josharian or @dominikh can give more advice how to proceed with this.

breml commented 2 years ago

Also x/pkgsite is affected, because https://pkg.go.dev refuses to render the API documentation.

gnuletik commented 2 years ago

Introducing msgraph-sdk-go in our codebase also increases memory usage at compile time which leads to compile: signal: killed error if not enough memory is available.

baywet commented 2 years ago

Thanks everyone! Mentioned a couple of names over there (tried to keep it limited to avoid spamming people), let's see if we get traction. @gnuletik yes, we've noticed the same thing sometimes happening with the beta package (which is larger than this one) on GitHub workflows. The interesting aspect is that it doesn't happen all the time, so I'm guessing it depends on how busy is the agent on their end.

baywet commented 2 years ago

Update: mentioning people was not well received but did catch some attention. A couple of positive news, apparently go 1.19 will improve build times, especially around packages loading time and link time, both steps that are taking a while with the SDK. Also, the documentation started working (I don't know exactly when), I suspect the team behind the website made some performance improvements as well https://pkg.go.dev/github.com/microsoftgraph/msgraph-sdk-go@v0.28.0

breml commented 2 years ago

@baywet Thank you for putting your self into the fire and I am sorry, for the negative feedback you received regarding mentioning people directly (especially, because I proposed it). That being said, I really don't know, what approach would have worked better. At least you now got some additional pointers.

One small correction about the documentation. For the models package, the documentation still fails, see: https://pkg.go.dev/github.com/microsoftgraph/msgraph-sdk-go@v0.28.0/models The documentation for the module it self always worked for me.

baywet commented 2 years ago

No need to apologize, as you said, at least we got some activity this way.

Thanks for clarifying, I thought the whole package was defective from a docs perspective, I'll bring this up as well.

jackhopner commented 2 years ago

@baywet Have you considered placing go.mod files at the root of each domain package, this would allow users to either add the entire SDK to their go projects or just the domains they're interested in.

It would look something like (where the domain packages all have models as a dep):

- drive
-- go.mod
-- go.sum
- drives
-- go.mod
-- go.sum
- education
-- go.mod
-- go.sum
...
- models
-- go.mod
-- go.sum
go.mod
go.sum

Also in regards to the compiler OOM, this is likely a race condition in when the GC runs. When github actions workers are busy and there's less CPU available it will likely GC less and so you'll have larger spikes in memory usage.

baywet commented 2 years ago

@jackhopner thanks for the suggestion. Wouldn't using the client end up pulling everything anyway, even with sub packages? https://github.com/microsoftgraph/msgraph-sdk-go/blob/main/graph_service_client.go

Could we force the garbage collector to be a bit more aggressive through GOGC? (for our own CI, I know it's not a solution for SDK users)

jackhopner commented 2 years ago

@jackhopner thanks for the suggestion. Wouldn't using the client end up pulling everything anyway, even with sub packages? https://github.com/microsoftgraph/msgraph-sdk-go/blob/main/graph_service_client.go

Could we force the garbage collector to be a bit more aggressive through GOGC? (for our own CI, I know it's not a solution for SDK users)

Good point, yeah you would need to also generate a client per domain package which I don't think is a terrible idea. I don't know the internals of the go compiler but this may actually help with build times as well because currently, you have one client with all the domains as dependencies, if that client was split up then the dependency graph is a lot more modular.

Could always host your own GitHub actions runners for your own CI if you want to avoid any resource constraints. Looking at what you get currently:

https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

It seems a bit crazy that the build would use > 7GB ram.

josharian commented 2 years ago

You might try -p=1 and/or -gcflags=all=-c=1. (These are flags to go build, go test, etc.) Those will slow down the build but should reduce the maximum memory usage by reducing concurrency. (The former disables process level concurrency; the latter, goroutine-level.)

baywet commented 2 years ago

Thanks! just put together this PR on the beta package that's been failing consistently for a couple of days now, let's see if it gives us a bit more wiggle room. https://github.com/microsoftgraph/msgraph-beta-sdk-go/pull/127

baywet commented 2 years ago

Also, @jackhopner we're considering that as an alternative, getting different runners at Microsoft is a bit of a process for security reasons.

In terms of one client per domain, would generating your own client with only the models/API paths you care about be a better option than this full SDK? Here are an overview of the steps to go through at the moment (you can skip to 9:25 if you don't want the context and just the steps) https://www.youtube.com/watch?v=Utm4sFJ1e_I

baywet commented 2 years ago

unfortunately this didn't "patch" the CI. Any other flags we could be using that you know of @josharian ? https://github.com/microsoftgraph/msgraph-beta-sdk-go/runs/7339900695?check_suite_focus=true

josharian commented 2 years ago

Not many options left without recompiling the toolchain. You could try GOGC=1.

jaym commented 2 years ago

I gave the Right size sdks instructions a try to generate a smaller sdk. Unfortunately, I ran into quite a few issues:

If it helps, I'm running

hidi transform -d https://raw.githubusercontent.com/microsoftgraph/msgraph-metadata/master/openapi/beta/openapi.yaml -c msgraph.json -o output.yaml
kiota -l go -d output.yaml -c GraphServiceClient -n msgraphbetasdkgo  -o client --ll Trace

where msgraph.json is https://gist.github.com/jaym/6cbb56b2405274bd169d76b2d4afb3f0

This was all on a linux machine. I tried on my windows machine, it dies with a System.OutOfMemoryException, which is expected given it only has 8GB of ram.

baywet commented 2 years ago

Hey @jaym Thanks for giving it a try, all of that is super valuable feedback! It's a bit strange that Kiota ended up eating so much memory since we're using it ourselves to generate this (and the beta) SDK. Would you mind reposting that message on kiota to avoid side tracking this thread please?

mblaschke commented 2 years ago

I would give a +1 to @jackhopner suggestion for packages as it would solve all issues but this would require a refactoring of the sdk (or at least for the main client).

FYI: There are Go design guidelines for Azure SDKs which are now used to create the new Azure SDK for Go which is IMHO much better then the previous version: https://azure.github.io/azure-sdk/golang_introduction.html

mblaschke commented 2 years ago

My GitHub Action builds are currently crashing the agent so i did some analyses (using github.com/microsoftgraph/msgraph-sdk-go@v0.32.0).

When I'm running go -a build (golang 1.18) inside Docker i realised that the build consumes 4 GB (for one arch, checked via docker stats):

CONTAINER ID   NAME             CPU %     MEM USAGE / LIMIT     MEM %     NET I/O       BLOCK I/O        PIDS
b1b99dbe10d3   angry_lovelace   137.40%   4.121GiB / 7.667GiB   53.75%    1.09kB / 0B   69.6kB / 912MB   37

So i guess the memory usage is killing the GitHub agent currently by the OOM killer.

using go build -a -p=1 -gcflags=all=-c=1 the memory usage just dropped a little bit:

CONTAINER ID   NAME               CPU %     MEM USAGE / LIMIT     MEM %     NET I/O       BLOCK I/O    PIDS
685d945c724d   gallant_poincare   101.71%   3.668GiB / 7.667GiB   47.83%    1.02kB / 0B   0B / 512MB   37

When building apps without msgraph-sdk they are using around 400 MB:

CONTAINER ID   NAME             CPU %     MEM USAGE / LIMIT     MEM %     NET I/O     BLOCK I/O       PIDS
5885f9bedd20   jovial_elgamal   410.89%   332.4MiB / 7.667GiB   4.23%     876B / 0B   860kB / 598kB   49

Similar results with golang 1.19 (with go build -a -p=1 -gcflags=all=-c=1 golang 1.19 used 4 GB)

This is becoming a huge issue (blocker) when building multi arch docker images as each arch build consumes ~4GB at the same time. Golanglint-ci was also killed so I guess it was also the OOM killer (not running in docker, no other parallel task was running).

jackhopner commented 2 years ago

It's probably worth noting that in the past couple of months there's been over an additional 100k lines added to this repo (https://github.com/microsoftgraph/msgraph-sdk-go/compare/v0.24.0...main)

I'm currently in a position (and I'm sure a lot of others are now) where we can't update to the latest (I'm on v0.24.0) because it would bring my development cycle to a stop. If there was a major bug found with the SDK or a security issue it would be painful for people using the SDK on older versions.

Just to list out the ideas we've come up with in this thread, so that hopefully we can start working towards one of them?

mblaschke commented 2 years ago

Also go get -u took now 51 minutes on my Apple MacBook 16" M1 MAX (32 GB RAM) even if there are no updates available 🤔

I guess I have to rollback to version 0.2x

baywet commented 2 years ago

@mblaschke do you notice any difference between go18 and 19? (same version of the SDK, same machine...)

I've also noticed that go get -u (no package specified) takes a very long time, it seems to be going through every package and triggering multiple compilations. My workaround so far has been to specify the package name I want to upgrade.

@jackhopner Thanks for the recap, I'm keeping an up to date list of things that need to be done before GA here this includes the generics, reduce code generation and a new one, selective imports (if possible).

As per the important growth, we're fixing bugs in the conversion library to make sure we're generating the right number of paths in the OpenAPI description. Long story short OData has a lot of conventions (which imply this or that path is supported), and annotations on those conventions (which need to be explicitly set to disable a specific operation/path). We went through a first phase of implementing the missing conventions to make sure all the paths are presents. We're still missing some annotations support (to make sure we don't produce too many paths). We're expecting the number of paths to reduce when those changes ship. An example is this. Additionally we're trying to reduce the duplication of types throughout the description like with that

The need for interface came because of the need to support "downcast", after talking about it with go maintainers, it seems to be the appropriate way to handle this.

Getter and setter are used both to expose the fields through the interfaces and also for future evolutions of the SDK (backing store, dirty tracking etc...)

In case of critical bug, our policy is to roll forward, meaning older versions won't get the fix. And we understand that as the size increases, it will be ever more painful for people to upgrade.

Lastly, we're still trying to figure out a solution to our staffing shortage, this is why the Go SDK hasn't seen lots of progress over the last few months. The suggestions and discussions here are helping a great deal to make sure we explore all solutions and prioritize things properly when we get some bandwidth. Thanks a lot!

mblaschke commented 2 years ago

Differences between go18 and 19? Unfortunately i didn't found any differences regarding build time and memory usage. Nothing which was worth to mention it. In tests the memory usage was higher when using-gcflags=all=-c=1 with go19 than with go18.

I'm using go get -u to update all dependencies and so far that worked fine for all packages out there including azure-sdk-for-go and kubernetes-sdk. As a developer I would like to avoid a workflow where I have to check every package manually as some applications have 50 dependencies or more 🤔

I've tried to integrate msgraph-sdk for my company applications but that didn't worked due to all of these issues. Many company notebooks are not as powerful as my private MacBook Pro M1 Max so that might be a blocker for all these developers out there. As (very ugly) workaround I've switched to Azure CLI (executing az ad ... inside golang) to fetch lists of groups/applications from Azure AD if anyone needs a workaround for such tasks.

For my FOSS applications I'm currently trying to integrate swap on the GitHub runners as there is only 7 GB RAM available (but 14 GB SSD):

      - name: Set Swap Space
        uses: pierotofy/set-swap-space@master
        with:
          swap-size-gb: 10

or in worst case switch to self-hosted runners (eg Hetzner cloud instances: 10 to 20 Euros per month) and enable swap as well on these instances. Or Microsoft could increase the specs of the GitHub runners 😉

For the staffing shortage I'm trying to address this topic via my company to our Azure technical consultant so let's hope that helps 🤞

baywet commented 2 years ago

Hey everyone, thanks for your patience on the matter. I do have news to share.

First off, we've been able to re-prioritize things and free up @rkodev from some of his other commitments so he can help here. Please keep being patient, he still needs to ramp up on the SDK.

Second, Ronald has already made contributions to help improve the situation by tackling this issue https://github.com/microsoftgraph/msgraph-sdk-go-core/issues/26 . We didn't notice any major improvement for our CI but we were wondering what's your experience with that change? (introduced with 0.34.0 released earlier today)

We do have lots scheduled over the next few months in terms of performance improvements investigation but also in terms of feature work and we'll keep you posted.

baywet commented 2 years ago

Hey everyone, We're having a design discussion that could halve the number of methods in the fluent Api thus reducing the size. If you're interested in providing your input please jump to #176

baywet commented 2 years ago

Hi everyone, Thanks for your patience with the size/performance issue. We just released v0.37.0 (and we're working to release the equivalent for beta as we speak). The main enhancement that version provides is adding support to pass context (#176) but while doing so, we also reviewed why we need to provide overloads and the conclusion was that we didn't.

This effectively allows us to reduce the number of execution methods on the fluent API by 50%, and we're seeing about 20% build time improvements.

While this is not transcending, it's a step in the right direction, and we have other avenues to investigate to keep improving this number.

We'd be happy to hear about your experiences upgrading to that new version and what build time performance improvements you notice. Thanks @rkodev for all the hard work!

mblaschke commented 2 years ago

situation improved a lot and it's at acceptable, thanks for the work. I can continue to use public GitHub runners again 👍

baywet commented 1 year ago

Hi everyone, We've just released v0.41.0 which takes advantage of generics and significantly reduces the amount of code being generated (see #284) As always, we're eager to see how your experience has changes and how much of an improvement you're measuring.

On top of that we have another change which will most likely become available by next week and reduces the number of methods we're emitting.

If the performance is still not good enough at that point, we'll explore:

Thanks for the patience and continuous feedback!