Moving to a web service

jakirkham commented 6 years ago

It’s certainly reasonable to start out with a corn job for these sorts of things. Also as we resolve some technical debt, the cron job is very helpful. That said, we have generally found in conda-forge that cron jobs inevitably struggle to scale.

To solve this problem, have ultimately moved all of them to web services that use webhooks. This allows them to deal with notifications as they come in and respond by doing some task. This approach seems well suited for updates. However it will require some thought into how we can get notifications from package indexes, GitHub, etc. Expect this will iron out any issues related to load.

CJ-Wright commented 6 years ago

I think that part of this is wrapped up in https://github.com/regro/cf-scripts/issues/53, since it is difficult to know exactly what kind of stress to expect on the system without some ballpark numbers. The whole bootstrap of the system also exaggerates some things (since we max out the CI time on every run of 03).

Currently we are clearing about 20 feedstocks per run with 03 and all the feedstocks with the others (although we'll fall behind when we hit 5000 feedstocks on 01).

I'm not opposed to moving this to a webservice, but the notification wrangling could be hard.

We also could do this in steps/build a hybrid system: 00 can be removed with a hook on staged (or via the feed, https://github.com/regro/cf-scripts/issues/38) 01 can be removed with a hook on all the feedstocks when PRs are merged (or via the feed, https://github.com/regro/cf-scripts/issues/38) 02 might be the most difficult to remove since listening for new releases from PYPI, CRAN, and GitHub may be difficult. 03 I'm less certain about, since I don't know what triggers it (I guess whatever triggers 02). If we can find a trigger for 02 then 02 and 03 could be merged.

jakirkham commented 6 years ago

Honestly our experience doing this at conda-forge has taught us that the system ends up being less stressed when converted from batch to a webservice. If you think about it a bit, this actually makes sense. The reason being updates in a web service don't all come at once in a big batch (this could be re-renderings, updating Circle SSH keys, or package updates). Instead they are sprinkled throughout the days at various times. The result ends up being things stay pretty light and the system handles events right away, which makes the whole thing more maintainable.

Agree that, handling the detection of updates has been and remains challenging. PyPI lacks the right kind of notification. ( https://github.com/pypa/warehouse/issues/1683 ) Same story with R. Both provide index wide feeds (Python, R), which we could parse. Not sure what we do with everything else. Maybe piggyback on Arch Linux? For the cases where we have feeds, we could have a process that filters these for us and triggers the update PRs. Presumably this would live on Heroku. Though could live elsewhere.

Just to outline this a bit, it sounds like we would want the webservice to handle these events. Am I missing any?

Feedstock added
Feedstock updated
Feedstock removed
Package updated -> Update feedstock

Given how package indexes seem to handle these problems, our web service would need to be designed around processing these feeds. Namely it would check feed notifications against a listing of packages. Periodically a new package could be added, in which case we would need to check its version independently and then add it to the list. Removal would be relatively straightforward. In some ways, it might not be worth processing feedstock updates (possibly removals), as this could easily be checked when the feedstock's package comes up again.

Thoughts?

jakirkham commented 6 years ago

@isuruf, might be interested in this. 😉

isuruf commented 6 years ago

@isuruf, might be interested in this.

I have an idea about using Libraries.io, Github and IFTTT to make this a webservice. Will look into it once I have some time.

CJ-Wright commented 4 years ago

I think this is now available for action. The graph is stored in a json format an so can be written to by pretty much anything. We could provide a webservice with the bot's credentials (or provision a new bot) so it could update the versions in the graph. Each package (that is not a stub or archived) should have a new_version key that represents what the bot thinks the newest upstream version is.

CJ-Wright commented 4 years ago

If external things write to the graph we could then kick off github actions that then cause PRs to be issued.

viniciusdc commented 4 years ago

@CJ-Wright @beckermr Is there a way to clarify it somehow?, I read some of the issues regarding the web services, but some of them are all jumping into a migration or a closed PR.

CJ-Wright commented 4 years ago

Sorry what do you want clarified?

viniciusdc commented 4 years ago

Sorry what do you want clarified?

What are the webservices ? I want to understand and only want to help.

CJ-Wright commented 4 years ago

Sorry, it's no problem I didn't know what you were asking. Conda-forge has a bunch of web services, these are tasks/jobs/things that are triggered by some action on the web. For instance if there was something that published that a new version was available we wouldn't need to scrape the web for it. Similarly, rather than updating all the feedstocks in the graph every run we could just update the ones that have changed.

viniciusdc commented 4 years ago

uhm... I kind of get it. But just to be sure, when you say webservices you are referring to services like Azure, CircleCI and others ? or it's something else like a server request ? (Also, thanks for the reply)

CJ-Wright commented 4 years ago

The code for the existing webservices (which run things like the team and token updating) is located here if you want to take a look.

My understanding (@beckermr might be able to provide more insight here, since has contributed considerably to our webservices) is that we setup a server (usually a heroku instance) that listens for updates from webpages and then acts accordingly.

viniciusdc commented 4 years ago

The code for the existing webservices (which run things like the team and token updating) is located here if you want to take a look.

My understanding (@beckermr might be able to provide more insight here, since has contributed considerably to our webservices) is that we setup a server (usually a heroku instance) that listens for updates from webpages and then acts accordingly.

Humm, now I understand what was been said. My understanding of the web services was roughly different. Thanks.

viniciusdc commented 4 years ago

"rather than updating all the feedstocks in the graph every run we could just update the ones that have changed."It's an exceptional idea isn't it ? Is there a way for me to help ? I saw the items list above, but its still vague.

beckermr commented 4 years ago

So the essential idea of this issue is to refactor the bot into a distributed system that responds to events.

Imagine we are running a migration and package A depends on package B. When the PR for package B is merged/closed, we could detect this event by listening to a webhook. When we see that, we could look at the graph and queue up the PR for package A. We could then have a cron-ish job read from the queue and try to issue the migration.

This would be a big refactor of how the bot works and is pretty out of scope right now.

viniciusdc commented 4 years ago

So the essential idea of this issue is to refactor the bot into a distributed system that responds to events.

Imagine we are running a migration and package A depends on package B. When the PR for package B is merged/closed, we could detect this event by listening to a webhook. When we see that, we could look at the graph and queue up the PR for package A. We could then have a cron-ish job read from the queue and try to issue the migration.

This would be a big refactor of how the bot works and is pretty out of scope right now.

Oh, ok... thanks Cj and Matt thanks for the comments, now I have an idea of it.

CJ-Wright commented 4 years ago

To be fair we could have some things done by webservice, for instance marking a PR as merged/closed might be possible now. I think the main issue there is that the GH repo for the graph is rather large and might not fit inside the server. (this was part of the initial reasoning to move to something like dynamo, which we should really put inside a milestone, all the things that need a distributed database like thing)

viniciusdc commented 4 years ago

To be fair we could have some things done by webservice, for instance marking a PR as merged/closed might be possible now. I think the main issue there is that the GH repo for the graph is rather large and might not fit inside the server. (this was part of the initial reasoning to move to something like dynamo, which we should really put inside a milestone, all the things that need a distributed database like thing)

What was the reason that dropped the idea for Dynamodb ?

CJ-Wright commented 4 years ago

We weren't able to implement it in a way that was cost effective and other issues were more pressing

viniciusdc commented 4 years ago

We weren't able to implement it in a way that was cost effective and other issues were more pressing

uhm, and there isn't any other platform we could try ? I think it could be a great improvement to reduce the burden with the CI clients.

beckermr commented 4 years ago

If you can find another provider then go for it.

beckermr commented 4 years ago

Don’t spend money without asking

viniciusdc commented 4 years ago

Don’t spend money without asking

Ok I will definitely not do that, but it's a good advice thanks.

viniciusdc commented 4 years ago

@beckermr What about MongoDB ?

CJ-Wright commented 4 years ago

Mongo could work although you need to host it somewhere

viniciusdc commented 4 years ago

Mongo could work although you need to host it somewhere

Yup, actually I was wondering about the cloud mode of it, but I was not sure about the amount of data we will need (as the cloud is limited to 5gb)

CJ-Wright commented 4 years ago

I think the first move there is figuring out how little of the PR json we can get away with.

viniciusdc commented 4 years ago

I think the first move there is figuring out how little of the PR json we can get away with.

Or maybe some classes of PR's we could get rid of.I was wondering in doing the 'track opened and closed PR's' first to reduce the number of PR's hosted, and than migrate the result to a table in some NoSQL server. (we could also set this list into a web service, this will allow us to not bump at any API limit). I can also be missing something too.

regro / cf-scripts

Moving to a web service #54