openzipkin-attic / zipkin-azure

Reporters and collectors for use in Azure
Apache License 2.0
11 stars 9 forks source link

CosmosDB / DocDB storage support #30

Open clehene opened 7 years ago

clehene commented 7 years ago

Creating this as a child of https://github.com/openzipkin/zipkin-azure/issues/8 @praveenbarli has something working.

Some things to consider and ideally discuss before an implementation. Zipkin backend model and queries. There's no formal specification but @adriancole has mentioned

As CosmosDB has multiple APIs (key-value, document and graph) it would be interesting to know what makes most sense for this backend and have a discussion on the model. Ideally we'd be able to make sure it's cost efficient at a good performance . See pricing https://azure.microsoft.com/en-us/pricing/details/cosmos-db/

Perhaps data retention could also be discussed? What's typically used?

Note that if Event Hubs is used as a queue before data lands in storage, that will impose some limits on the throughput (EH is limited to 20K rps). Since Cosmos DB can handle way more than that, perhaps it would make sense to be able to push directly without going through EH?

aliostad commented 7 years ago

Hi, sorry been away. Would you mind sharing where you saw 20K RPS for EventHub? AFAIK Microsoft never really defined an official limit although I have heard of stories where they had issues with scale.

Update

Huh, found it - small print in here https://azure.microsoft.com/en-gb/pricing/details/event-hubs/

When we first used EH there was no such thing.

Update 2

Contacted Azure friends. With Dedicated tier you can go up to 2 million events per second:

Scalable between 1 and 8 capacity units (CU) – providing up to 2 million ingress events per second.

From https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-dedicated-overview

clehene commented 7 years ago

@aliostad glad to see there's a new tier for Event Hubs. I think it would be worth documenting these limits / options for Zipkin. Unfortunately not sure how to keep them up to date as there seems to be no standard way (e.g. API) to get the units, limits and pricing.

Also this is an interesting detail about dedicated pricing:

Dedicated Event Hubs is only available to EA customers and is sold at a retail price of $733.13 per day for 1 Capacity Unit. https://azure.microsoft.com/en-us/pricing/details/event-hubs/

It seems to be a big jump to over $250K / year.

aliostad commented 7 years ago

@clehene hi. Running at that level will not be cheaper with DocumentDB. I have done some PoC on extreme load and frankly the cost will be higher since it does more. Here is some details:

The test that we were doing was with 4KB docs. Storing each one used by ~70RU. So 200K RU will cost £108K a year while giving you enough RU to store mere 3K events per second! (assuming 4KB each)

Also on the read side you will start having problems with one-by-one read and delete and you do not have checkpointing that comes free with EventHub.

So in short, CosmosDB - when it will have table support - will be good but right now only works if you use it with Azure Search (there is an option to store docs and have Azure Search index them). I can work on this if enough people interested but it seems we already have a working version?

codefromthecrypt commented 7 years ago

Side note: I would guess that the topic of using docdb as storage/query is a different topic than if eventhub is used as a transport, right? Does it make sense to discuss continue transport discussion here on this issue or would it be clearer as a different issue?

clehene commented 7 years ago

@adriancole the main reason of discussing EH here, is in the context of the original question of whether it's worth having it in front of Cosmos DB. My assumption is / was that it may be more expensive and with less availability and throughput and we may implement CosmosDB as a storage and transport at the same time.

That said, @aliostad points related to CosmosDB are valid and relevant within the CosmosDB discussion and it would be worth expanding on a few topics like what would the ideal data model for Zipkin data in CosmosDB be from a size and query capability perspective? This may be useful to determine size / cost mappings https://docs.microsoft.com/en-us/azure/cosmos-db/request-units.

codefromthecrypt commented 7 years ago

@clehene ok I think I understand. yeah for example there are folks who use storage directly instead of having a separate transport (I've heard of this used for both elasticsearch and also cassandra although it isn't common practice). There are impacts to how you'd design the data model if you think people would be doing this, and yeah there'd be no way for zipkin to prevent people from skipping a separate transport if they wanted to.