openfoodfoundation / ofn-install

Ansible scripts for provisioning and deploying Open Food Network
54 stars 112 forks source link

Change to a Datadog's paid plan #367

Closed sauloperez closed 5 years ago

sauloperez commented 5 years ago

As explained in-depth in https://community.openfoodnetwork.org/t/making-operations-a-first-class-citizen/1601 we need to start paying for Datadog to get data retention longer than a day plus valuable integrations such as Postgres and Delayed Job, essentials at this point for OFN.

sauloperez commented 5 years ago

We're currently monitoring UK, FR and Katuma but I'd also like to add other smaller instances finding the balance of ease of management (all instances using the same services) and pricing. As for AU, it was agreed in https://github.com/openfoodfoundation/ofn-install/issues/273 that it'll stick to Wormly instead of Datadog. IMO instance that don't use ofn-install fall off the list.

Screenshot from 2019-04-03 14-57-41

If we go up to 5 hosts that is 90$ per month, whereas if we stick to 3 hosts: UK, FR, Katuma that is 54$. Both amounts are totally affordable IMO.

If we paid annually we would save 3$ per host per month. Is it worth the cost of being locked with Datadog for a year? Chances are that we won't move from it in a year though. Another option is to pay as we go but that feels really scary to me. I guess it's more an accounting/management question.

As we did with Bitwarden, I'd like this to be paid by Katuma as a contribution to the global pot.

thoughts @myriamboure @Matt-Yorkley @mkllnk @luisramos0 ?

enricostano commented 5 years ago

What about data retention with these prices?

luisramos0 commented 5 years ago

You know I'd put this money right way into metal... you can almost double all those 5 servers' capacity with this money #performance

Anyway, it will be nice to have this data!!!

sauloperez commented 5 years ago

I know @luisramos0 but what's the value of such metal if have no idea what is going on under the hood? It's like having a powerful car without any idea where to drive it to.

sauloperez commented 5 years ago

What about data retention with these prices?

That comes with 15-month data retention.

What I see now is that we might need to pay for the Application Performance Monitoring product to get the Delayed Job integration. I just sent an email to customer support to have a clear answer. If that was the case, it'd be extra 31$ per host per month. IMO we could live without it as long as we do get the PostgreSQL integration.

luisramos0 commented 5 years ago

we have a 30secs page load time currently on shops list, map and the backoffice for super admins on the main instances is a total joke. it's a tricky discussion, I understand you see other priorities, that's why I only shared what I'd do: buy metal. imo we are not doing correct infrastructure capacity planning.

sauloperez commented 5 years ago

I understand you see other priorities

Not at all. We do have the same ones it's just that we see other ways to reach the same goal. I think I've shared this screenshot several times but how will more metal on top of the one that is already underused change the situation? See UK's production stats below.

Screenshot from 2019-04-03 15-59-15

With the current specs they experience regular downtime not to mention the 1h 40min OFF had with the same specs, for which I wrote a postmortem.

I would like to see how throwing metal at this could change the situation but I don't. That would ideal as we could focus our efforts on other things.

luisramos0 commented 5 years ago

uptime and performance are two different priorities.

I have seen metal help with uptime even when the data is looking normal as those images show (load average of 3 for 4 cpus is something, it's not like the server is sleeping).

but it can also be a bug.

in the past I have used monitoring data to detect problems, very rarely to fix problems.

sauloperez commented 5 years ago

uptime and performance are two different priorities.

Absolutely. while I stopped working on the second to stick to the priorities until we have the gathering I don't think we can afford not caring about the former.

in the past I have used monitoring data to detect problems, very rarely to fix problems.

And that is all I want

sauloperez commented 5 years ago

Let's not forget this is also needed to have decent data retention to allow us to spot problems caused by the v2 roll out as explained in https://community.openfoodnetwork.org/t/making-operations-a-first-class-citizen/1601

enricostano commented 5 years ago

That comes with 15-month data retention.

15 days?

sauloperez commented 5 years ago

Screenshot_20190403-181427

Matt-Yorkley commented 5 years ago

First impressions:

mkllnk commented 5 years ago

We pay $36 for Wormly in Australia. It would increase if we added a lot more hosts for monitoring but probably not as much. They don't bill by host, they bill by how many metrics you are monitoring. Anyway, my conclusion is that those prices are comparable and there is not much in it.

If we talk about an additional $31 for APM, that's very expensive. That would be $155 extra for the five hosts.

Can we select for which hosts we go paid and for which ones we don't or do we then need two accounts?

It's probably enough to get data retention for two or three big instances and maybe APM for one instance. The application behaves pretty much the same on each host. That makes it affordable and should give us all the data we need.

sauloperez commented 5 years ago

That's another option we could explore in the future. I would keep things agile for now and stick to the paid plan to get longer retention plus the PostgreSQL integration. This would be a lot already as we already have a basic APM with Skylight.

To answer you @Matt-Yorkley

do we need 15-month data retention?

We don't but that's what they offer.

RachL commented 5 years ago

@sauloperez I think in France we are ok to go forward with this :) Let's do it!

sauloperez commented 5 years ago

Done! Say hello to a wealth of data and more to come!

Captura de pantalla 2019-04-19 a les 10 17 50

👆 France production in the last 3 months. I did my best to get some sort of discount but no luck. We'll see as we pay for more hosts in the future 🤞