stackabletech / stackablectl

Commandline tool to interact with a Stackable Data Platform
Other
8 stars 1 forks source link

[Merged by Bors] - Improve water-level demo #126

Closed sbernauer closed 2 years ago

sbernauer commented 2 years ago

Description

Run with stackablectl --additional-demos-file demos/demos-v1.yaml --additional-stacks-file stacks/stacks-v1.yaml demo install nifi-kafka-druid-water-level-data

Tested demo with 2.500.000.000 records

Hi all, here a short summary of the observations of the water-level demo:

NiFi uses content-repo pvc but keeps it at ~50% usage => Shoud be fine forever Actions:

Kafka uses pvc (currently 15gb) => Should work fine for ~1 week Actions:

Druid uses S3 for deep storage (S3 has 15gb). But currently it also cashes everything locally at the historical because we set druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"300g"}] (hardcoded in https://github.com/stackabletech/druid-operator/blob/45525033f5f3f52e0997a9b4d79ebe9090e9e0a0/deploy/config-spec/properties.yaml#L725) This does not really effect the demo, as 100.000.000 records (let's call it data of ~1 week) have ~400MB. I think the main problem with the demo is that queries take > 5 minutes to complete and Superset shows timeouts. The historical pod suspiciously uses exactly one core of cpu and the queries are really slow for a "big data" system IMHO. This could be because either druid is only using a single core or because we dont set any resources (yet!) and the node does not have more cores available. Going to reasearch that. Actions:

Review Checklist

Once the review is done, comment bors r+ (or bors merge) to merge. Further information

sbernauer commented 2 years ago

bors r+

bors[bot] commented 2 years ago

Pull request successfully merged into main.

Build succeeded: