googleapis / google-api-php-client-services

http://googleapis.github.io/google-api-php-client-services/
Apache License 2.0
1.22k stars 314 forks source link

Split packages for the sake of bandwidth disk space and the environnement #595

Open Tofandel opened 2 years ago

Tofandel commented 2 years ago

This package is the biggest composer package I have ever seen: 40Mb => 73Mb real (because a lot of small files)

Just by itself it weights 50% of all my dependencies, and yet I only use it to call 4 api endpoints... image

This single package by itself is using more than 20Go on our pipelines artifacts and an easy 1Go of bandwith per month which is insane in term of carbon footprint considered it's a popular package made by such a big company as google. In comparison, https://github.com/fzaninotto/Faker was shutdown by it's author for environmental concerns for much less

I think it's time to be responsible and to split this package into multiple smaller packages that can be independantly installed based on what users need (eg maybe I'll install a combo of Analytics and Youtube, but I don't want to have the 100's of other subfolders that come with this currently)

Related #222

It is possible to use a monorepo with a private packagist as documented there https://blog.packagist.com/installing-composer-packages-from-monorepos/ Or using a different registry than packagist

If this is not ideal because of the need to add the registry to the composer.json (which isn't that hard) or the high asking price of a private packagist (a bigger issue, in which case I'd recommend a free alternative where it automatically creates multiple github projects https://packagist.org/packages/symplify/monorepo-builder)

Then alternatively I'd consider just separating the really big folders into separate packages, in their respective order:

  1. Compute
  2. ShoppingContent
  3. DialogFlow
  4. Dfareporting
  5. DisplayVideo
  6. Apigee
  7. Vision
  8. Sheets & Documents
  9. Youtube

And keep all the rest in one package, this will help alleviate the size of the main package by 25%, which isn't that much but is already 20 real Mb

itsAnuga commented 2 years ago

Might wanna read up on how to do a limited install.

https://github.com/googleapis/google-api-php-client#cleaning-up-unused-services

bshaffer commented 2 years ago

The best solution we have now is, as @itsAnuga mentioned, cleaning up unused services after they're installed. This does not save you network bandwidth, but it at least saves you disk space. I apologize for the inconvenience. We would split this across multiple Composer packages, but unfortunately Composer requires a dedicated git repository for each repo, which would add up to 243 individual repos, and we simply do not have the ability to maintain that many. And as you said, private packagist or maintaining our own packagist server, is a bit much for a repository that has (unfortuantely) no staffing resources.

I'll keep this open as a feature request because it would be fun to try and set up our own server. But this may be a "be careful what you wish for" scenario, because if that service ends up being unreliable, it will cause more pain than any benefit to our customers.

Thanks for filing this issue and for the detailed description!

fredericgboutin-yapla commented 2 years ago

I would like to point out the fact that the optimize-autoloader Composer directive seems heavily impacted, performance wise speaking.

Reference: https://getcomposer.org/doc/articles/autoloader-optimization.md#optimization-level-1-class-map-generation

So when you composer install, Composer has to scan all those 16k classes and it generates an entry for each of them for autoload at run-time (see autoload_classmap.php). Those entries total 3.1MB.

Thank God the PHP associative array is fast at runtime but gosh this Google API PHP Client Services is ressource hungry.

The https://github.com/googleapis/google-api-php-client-services/issues/595#issuecomment-1030651691 seems to point to the right direction but the directive clearly make one install the whole package and then it cleans up the folder, 2 costly and independent operations.

bshaffer commented 1 year ago

Because the cleanup scripts before the autoloader is generated (e.g. the pre-autoload-dump event), the optimized classmap will not contained the removed services. So as long as you do the limited install, you should be fine here.

dkarlovi commented 1 year ago

@bshaffer wouldn't you be able to use the same repo split strategy used by Symfony and many other repos? it proved to work quite well and allows you to keep doing the work in exactly the same way as you currently do.

Splitting the repo into is done by existing tools which have proved to be quite robust over the years.

Tofandel commented 1 year ago

A lot of solutions were already laid out originally, this would fall under

If this is not ideal because of the need to add the registry to the composer.json (which isn't that hard) or the high asking price of a private packagist (a bigger issue, in which case I'd recommend a free alternative where it automatically creates multiple github projects https://packagist.org/packages/symplify/monorepo-builder)

This should be easier to setup, it will create a few hundred of readonly github repositories though, so they maybe should be under a new organization (like googleapis-readonly) just for those

I just think the packagist packages will need to be registered once one by one on the first publish, so it will be a lot of monkey work, but after that it's all automated

dkarlovi commented 1 year ago

Packagist has an API which can be used to register the packages on creation, it doesn't need to be manually done, AFAIK Symfony's release process is fully automated. The packages can indeed be kept under a new Github org and still use the same Packagist namespace.

Looking at the code, splitting the repo would need some preparatory work but it would only need to be done once and then the packages could just be added/removed automagically.