Set up multiple RTD projects for latency checks

agjohnson commented 2 years ago

Instead of using conda to trigger building on build-large, it might be better to have two separate time-test projects on RTD. Both would have the queue set on the project level, one pinned to the build-default queue, one pinned to the build-large queue.

This would avoid paging us on unrelated errors with conda.

Note: the background here is that an unrelated failure with Conda causes our latency checks to fail and I was paged for this. If testing Conda isn't important here, using separate projects seems like a great approach.

Changes:

[ ] Set up community latency check project
[ ] Set up commercial latency check project
[ ] Configure community Cloudwatch to use multiple projects for latency monitors/alarms?
[ ] Configure commercial Cloudwatch to use multiple projects for latency monitors/alarms?

agjohnson commented 2 years ago

For now, to get resolve this issue with pagerduty, I've repointed the project to my fork and disabled conda for the version. This is effectively not testing build-large for latency.

To revert my change, change the project repo back to this repo.

However, there might still be build errors with conda giving a 403 response on conda env create.

https://admin.ops.readthedocs.org:10000/admin/projects/project/487639/change/?_changelist_filters=q%3Dtime-test

agjohnson commented 2 years ago

@humitos noted a temporary fix here for the Conda URL. That would at least get us back to testing build:large latency at least. Multiple projects or some other hack to enforce build:large queue could be a secondary goal -- at least we don't have room for non-deterministic behavior that way.

humitos commented 2 years ago

I re-pointed the Read the Docs project to this repository and created #11. That PR fixes the Conda problem and start using build-large again to build. This should fix the problem that we had.

Note that this fix does not create one Read the Docs project per queue as we talked. There is pros and cons in both approaches. The current one, also "test" that the task router works 😄

humitos commented 2 years ago

I think we are OK with the current solution. I don't think there is too much value at this point to split it in multiple projects. I'm de-prioritizing this work and removing it from my sprint. We can come back if we think it's strictly required, tho.

ericholscher commented 1 year ago

@humitos is this what you fixed this week with it being busted on .com?

humitos commented 1 year ago

@ericholscher Well, I solve the problem we had with the current implementation.

However, in this issue Anthony is suggesting to change the implementation to have multiple time-test projects on the same platform. So, we can say they are different things.

ericholscher commented 1 year ago

Oh yea, I see, so a latency-check-build-default and latency-check-build-large project for example, and mark them explicitly 👍

agjohnson commented 1 year ago

Aye, the bug I hit originally was Conda failed randomly (I think a URL changed) and our build-large latency check broke as a side effect. It took some scrambling to figure out what the configuration was meant to be.

I re-pointed the Read the Docs project to this repository and created https://github.com/readthedocs/time-test/pull/11.

I agree this is close enough for now though. It should avoid some of the failures we've seen at Conda, and if this does pop back up, perhaps that's the point we can think about splitting up our projects.

humitos commented 3 months ago

We don't build-large anymore, so this issue is a non-issue at this point. I'm closing it, but feel free to re-open.

readthedocs / time-test

Set up multiple RTD projects for latency checks #9