microsoftgraph / msgraph-sdk-ruby

Microsoft Graph Ruby client library for v1 APIs
https://graph.microsoft.com
MIT License
99 stars 68 forks source link

SDK requires slow down app boot considerably #144

Open louim opened 11 months ago

louim commented 11 months ago

The current generated SDK load so many files that it slows down our app boot by 3x. I ran a ballpark estimate with find lib -type f | wc -l which got me 20839 files 🤯 . This is only the load time for the app. I started to benchmark after a noticed that our app specs took an abnormal time to run. I'm benchmarking with Hyperfine.

Here's a benchmark of the rails runner to limit the timing to app load. Here a before run:

git checkout <commit before adding the gem>
bundle install
bin/rails tmp:clear
bin/spring stop

hyperfine --warmup 3 'DISABLE_SPRING=1 bin/rails runner true'

Benchmark 1: DISABLE_SPRING=1 bin/rails runner true
  Time (mean ± σ):      3.857 s ±  0.082 s    [User: 1.576 s, System: 1.292 s]
  Range (min … max):    3.724 s …  3.946 s    10 runs

Here's an after run, showing a 3x increase in load times:

git checkout <commit after adding the gem>
bundle install
bin/rails tmp:clear
bin/spring stop
hyperfine --warmup 3 'DISABLE_SPRING=1 bin/rails runner true'

Benchmark 1: DISABLE_SPRING=1 bin/rails runner true
  Time (mean ± σ):     11.139 s ±  0.124 s    [User: 4.025 s, System: 4.633 s]
  Range (min … max):   10.979 s … 11.360 s    10 runs
baywet commented 11 months ago

Thanks for starting this topic.

Setting the context

Let's acknowledge Microsoft Graph is one of the largest REST APIs in the world with over 5000 endpoints and 20k operations, growing every week. Due to the nature of the modeling approach, slicing this package into functional packages is impossible without major trade-offs (duplication, boundaries that are not practical for consumers etc...). Optimizing the SDK is a process that we go through with every language before we GA and the help of the community is crucial in that process.

Because the Ruby SDK is not staffed at this time, we're not going to be able to invest significant time in that investigation, but we'll be more than happy to keep gathering feedback on the issue until we can spend significant time on it. Also whenever we get people to work on the SDK, they are more likely to work on the feature gap and other blocking issues of this language first before they focus on optimizations.

I hope the lengthy reply will provide visibility over the current situation and our plans to address that aspect.

Alternatives

Using kiota

This SDK is generated by kiota and you could generated your own SDK instead of using this pre-packaged one. The main benefit of this approach being you can select only the endpoints/operations you care about for your application, reducing significantly the amount of code being generated, and indirectly improving performance of dev tooling/production runtime/etc...

Warning up the application

Some runtimes support "warming up the application" before sending actual traffic to an instance so the user is not impacted by getting a fresh instance, maybe something to consider if the issue is only present at runtime.

Areas to investigate

Number of files and folders

Investigating performance for other languages made us realize that some runtimes/dev tooling are really sensitive to the number of files and folders (for the same amount of source code) and just the loading/linking process is highly impacted. Kiota supports grouping files together, or even flattening the directory structure is we need to do so. And investigating via POCs whether this would be beneficial would be really helpful.

Amount of code

The less code do perform the same operation, generally the better performance. Ruby already benefits from learnings of the other languages (like moving redundant properties to parent types). But don't hesitate to outline additional opportunities if you see redundant code.

Kind of code

Some code expressions can be really expensive for the compiler, E.g. we discovered that using generics in Go highly impacts build time performance. This is where people more specialized in the language should be able to provide pointers by looking at the present structure.

j15e commented 11 months ago

Thanks for the detailed context an suggestions! 👍

In our production environment, this is not too much an issue. With a Ruby on Rails app, it preloads all classes during boot and they are not reloaded after. There is an added cost to each deploy time (+5 to 6secs), but this isn't too bad.

Where it really hurts is in our development environment where classes & dependencies are reloaded more often and between test runs, this becomes more problematic because it slows down every developer all the time.

We did try to generate a smaller SDK using Kiota (with --include-path), but since we need users and groups endpoints, it still ended up including ~7000 files (instead of 20 000) because there a lot of associations to those two resources. Is there a way to also limit generated models, not just endpoints? We also had an issue where the basic GET request for groups & users were not included with the filter for some reason we could not figure out.

From our quick investigation, we noted the following area of improvements for the Ruby generator:

louim commented 11 months ago

Hey! thanks for the detailed answer. We investigated building our own SDK using Kiota, but we encountered a few blockers. For the record, we were using the docker package to build endpoints for the users and groups operations. Based on our understanding of the SDK, we were running the following to try to get an custom SDK.

docker run -v ${PWD}/lib:/app/output mcr.microsoft.com/openapi/kiota \
generate --language ruby --namespace-name MicrosoftGraph  --class-name graph_base_service_client --openapi \
https://raw.githubusercontent.com/microsoftgraph/msgraph-metadata/master/openapi/v1.0/openapi.yaml \
--include-path /users/** --include-path /groups/** --clean-output

Here are some of the blockers we encountered:

baywet commented 11 months ago

Thanks for the additional information everyone.

For the missing operations, try with

--include-path /users/** --include-path /groups/** --include-path /users --include-path /groups

You can include only a single operation (if you don't need others) by doing

--include-path /groups#GET

You don't have to use glob patterns, you can simply specify the list of paths if you know them in advance. Users and Groups have a lot of endpoints under them, and the glob patterns you're currently specifying are probably bringing much more than you actually need.

As per slicing the models themselves, it's not supported. Our investments are going to making sure the metadata/OAS description are as accurate as possible which is currently a challenge (more details in the Go SDK post I initially shared). Long story short, some relationships between entities are imposed by OData conventions but won't ever contain any data due to service implementation. The relationships need to be severed, but only for the path segmentation or the model projection. And OData doesn't support that notion today...

@j15e Can you create two issues in Kiota please? One for the repeated code, one for the empty files.

Most of the files you're listing out here are manually handcrafted to pass more context (which version, etc) from the service library (this SDK) to the lower layers of the core SDK or kiota implementations. I don't believe you actually need them (you could simply change the name of the class while generating)

I hope this helps, keep the feedback coming!