Open rndquu opened 1 month ago
We should deal with priorities because I think the most important part is to keep commands running otherwise users wouldn't understand why their commands don't do anything.
Second this could be mitigated by having plugins crawling (like daemon-merging
and daemon-disqualifier
) running only once a day, because I think they reacted to all these events and run quite a number of times, and these crawl the whole orgs so require quite a lot of API calls.
We should deal with priorities because I think the most important part is to keep commands running
When github app installation token is rate limited then the kernel stops executing any command in the chain so prioritizing plugins in the chain doesn't prevent rate limiting.
Second this could be mitigated by having plugins crawling
Yes, right now daemon-disqualifier
runs pretty often.
To sum up there're actually 3 issues:
ubiquity:listeners
events).ubiquity-os
users by consuming all available rate limits (because 3rd party plugin devs have access to the ubiquity-os
installation token). Not sure what's the solution here.I've temporarily removed time estimate because the spec is not clear.
As far as I understand from the docs github app installation token cannot increase beyond 12,500 requests per hour
per repository (or organization?).
@gentlementlegen What if we restrict plugins' github API usage this way:
PLUGINS_GITHUB_API_AVAILABLE_QUOTA=10
which would mean that a single plugin in the chain can not consume more than 10% of available github API resources per hourmanifest.json
)
b) Plugin execution is suspended for some time (like 30 mins)This way plugin developers will have to optimize github usage of their plugins. This solves the issue with github API "griefing".
My storage layer app needs installed by each org and gives us a new access token to work with that is bound to each individual org and each org can be tracked via installs easily. If we create a storage repo instead of using the config repo as the storage location
My storage layer app needs installed by each org and gives us a new access token to work with that is bound to each individual org and each org can be tracked via installs easily. If we create a storage repo instead of using the config repo as the storage location
- we don't expose the sensitive config repo to the storage layer app which everyone is going to be able to access
- We can use the token which is bound to the org instead of us sharing the kernel'
I don't have much context regarding the storage feature so it's not really clear:
ubiquity-os
app and why partners need to install both.ubiquity-os
folderubiquity-os
one, as far as I understand there're the same entities (i.e. conform to the same 12.5k github API rate limit)https://github.com/ubq-testing/telegram--bot/blob/gh-storage/src/adapters/github/storage-layer.ts
storage app will be responsible for handling all of that partner' storage need;, giving it it's own rate limit as once implemented across all plugins the rate consumption is going to be big due to the nature of using GH as a DB. TG Bot used kernel' key to dispatch it's own wfs but deemed unsecure. We needed a token for each partner and one that has auth to read/write from inside plugins not the kernel. Installation octokit solves this and makes it easier identifying storage locations
2, 3. If we use .ubiquity-os
we expose the config. If that's not a problem then we don't need another repo and can use it.
They'd share the same rate limits as you state but they'd be bound individually to the partner as opposed to all eventual partner' using a centralized token.
After plugin execution the kernel checks available limits again. If quota was reduced more than by 10% then: a) Email notification is sent to plugin developer (fetched from manifest.json) b) Plugin execution is suspended for some time (like 30 mins)
Maybe I'm mistaken but isn't it possible that with an abundance of partner' all using the same token, plugin devs could optimize to respect this threshold but have the rate limit consumed to nil because of the size of the partner network? There are 3 or 4 rn all ours but not super crazy active like onboarding a couple of huge blue chip partners might be.
It's just a suggestion, obv there are overlaps that would need ironed out but my core suggestion is for plugins to use a token bound to the partner using a GitHub App' installs which would require exposing app_private_key, in this way I believe that we obtain a partner-bound token
Maybe I'm mistaken but isn't it possible that with an abundance of partner' all using the same token, plugin devs could optimize to respect this threshold but have the rate limit consumed to nil because of the size of the partner network?
Yes, this approach prevents partners from abusing github API rate limits but not solves the scalability issue when there're simply too many partners and thus too many incoming webhook events. We need some kind of a github API cache or queue system, whatever is simpler/faster to implement which is enough for "scalability v1".
I updated my comment just as you posted sorry
It's just a suggestion, obv there are overlaps that would need ironed out but my core suggestion is for plugins to use a token bound to the partner using a GitHub App' installs which would require exposing app_private_key, in this way I believe that we obtain a partner-bound token
Plugins would use the partner-bound token and the kernel could also use it per-org I believe (@whilefoo @gentlementlegen rfc?) which would help but your right it's still very likely to max with things as-is
@gentlementlegen What if we restrict plugins' github API usage this way:
Add a new config parameter to the kernel like PLUGINS_GITHUB_API_AVAILABLE_QUOTA=10 which would mean that a single plugin in the chain can not consume more than 10% of available github API resources per hour Before plugin execution in the chain the kernel checks available limits After plugin execution the kernel checks available limits again. If quota was reduced more than by 10% then: a) Email notification is sent to plugin developer (fetched from manifest.json) b) Plugin execution is suspended for some time (like 30 mins) This way plugin developers will have to optimize github usage of their plugins. This solves the issue with github API "griefing".
It would be nice if somehow these quotas would count against their app / repo instead, no idea if this can be achieved. Otherwise yes that could work, don't grasp the complexity of such change.
It would be nice if somehow these quotas would count against their app / repo instead, no idea if this can be achieved.
I feel like I need a primer on the kernel. I actually thought that the kernel uses ubiquity-os
app's installation token and passes it to all of the plugins. So I thought that on the kernel side we could simply call this API method to check available limits.
2. Third party plugin developers may actually "grief"
ubiquity-os
users by consuming all available rate limits (because 3rd party plugin devs have access to theubiquity-os
installation token). Not sure what's the solution here.
I don't think there's incentive to grief because the rate limits are bound to a specific org and don't interrupt other orgs. The partner can just disable the plugin if it's using all of their rate limits
I feel like I need a primer on the kernel. I actually thought that the kernel uses
ubiquity-os
app's installation token and passes it to all of the plugins.
That's correct, the kernel uses org-bound installation tokens and passes it to plugins
It's just a suggestion, obv there are overlaps that would need ironed out but my core suggestion is for plugins to use a token bound to the partner using a GitHub App' installs which would require exposing app_private_key, in this way I believe that we obtain a partner-bound token
The tokens passed to plugins are already bound to the partner/org so I don't quite understand how the storage solution would help to solve this
- if rate limit is not exceeded proceed the next item in the queue (with cached config received at the time of webhook creation)
if we rate limit is hit then we can't even download the config though unless there is a separate rate limit for that
The tokens passed to plugins are already bound to the partner/org so I don't quite understand how the storage solution would help to solve this
Oh I didn't realise that was the case already my mistake
See how each plugin has conditions that will cause it to "skip" or stop execution, some plugins have these checks right at the entry point and the check itself is very simple.
If we can expose these conditions from the manifest of plugins and have the kernel perform it first then we'd save a lot of false starts for actions at least.
E.g: command-ask
currently spawns a workflow for every single comment that appears across all partner installs whether it begins with "@ubiquityos" or not.
Surely possible to reduce the boolean logic down to something that the kernel can consume. In command-ask
the condition most simply is does payload body include an env var
, all of which the kernel passes down. A fix for my example is to include a commands
entry in the manifest, which does for slash commands exactly what I'm suggesting but I'm proposing we expand it so we can tell the kernel to compare arb env:payload vars and deny firing requests based on that.
This is less of a global solve and more of a potential optimization.
There might be cases when the kernel receives too many webhook events, for example:
ubiquity-os
appWhen the kernel receives too many webhooks it exceeds github API rate limits which causes the kernel to be unresponsive and throwing the
Rate limit hit with \"GET /repos/{owner}/{repo}/contents/{path}\", retrying in 1469 seconds
error.There should not be any downtimes in the kernel work.
As a part of this issue we should implement a webhook queue: