Run with
stackablectl --additional-demos-file demos/demos-v1.yaml --additional-stacks-file stacks/stacks-v1.yaml demo install nifi-kafka-druid-water-level-data
Tested demo with 2.500.000.000 records
Hi all, here a short summary of the observations of the water-level demo:
NiFi uses content-repo pvc but keeps it at ~50% usage => Shoud be fine forever
Actions:
Increase content-repo 5->10 gb, better safe than sorry. I was able to crash it by using large queues and stalling processors.
Kafka uses pvc (currently 15gb) => Should work fine for ~1 week
Actions:
Look into retentions settings (low priority as it should work ~1 week) so that it works forever
Druid uses S3 for deep storage (S3 has 15gb). But currently it also cashes everything locally at the historical because we set druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"300g"}] (hardcoded in https://github.com/stackabletech/druid-operator/blob/45525033f5f3f52e0997a9b4d79ebe9090e9e0a0/deploy/config-spec/properties.yaml#L725)
This does not really effect the demo, as 100.000.000 records (let's call it data of ~1 week) have ~400MB.
I think the main problem with the demo is that queries take > 5 minutes to complete and Superset shows timeouts.
The historical pod suspiciously uses exactly one core of cpu and the queries are really slow for a "big data" system IMHO.
This could be because either druid is only using a single core or because we dont set any resources (yet!) and the node does not have more cores available. Going to reasearch that.
Actions:
In the meantime configure overwrite in the demo druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"3g","freeSpacePercent":"5.0"}]
Research slow query performance
Have a look at the queries the Superset Dashboard executes and optimize them
Maybe we should bump the druid-operator versions in the demo (e.g. create release 22.09-druid which basically is 22.09 with a newer druid-op version). Therefore we get stable resources.
Enable Druid auto compaction to reduce number of segments
Review Checklist
[ ] Code contains useful comments
[ ] (Integration-)Test cases added (or not applicable)
[ ] Documentation added (or not applicable)
[ ] Changelog updated (or not applicable)
[ ] Cargo.toml only contains references to git tags (not specific commits or branches)
Once the review is done, comment bors r+ (or bors merge) to merge. Further information
Description
Run with
stackablectl --additional-demos-file demos/demos-v1.yaml --additional-stacks-file stacks/stacks-v1.yaml demo install nifi-kafka-druid-water-level-data
Tested demo with 2.500.000.000 records
Hi all, here a short summary of the observations of the water-level demo:
NiFi uses content-repo pvc but keeps it at ~50% usage => Shoud be fine forever Actions:
Kafka uses pvc (currently 15gb) => Should work fine for ~1 week Actions:
Druid uses S3 for deep storage (S3 has 15gb). But currently it also cashes everything locally at the historical because we set
druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"300g"}]
(hardcoded in https://github.com/stackabletech/druid-operator/blob/45525033f5f3f52e0997a9b4d79ebe9090e9e0a0/deploy/config-spec/properties.yaml#L725) This does not really effect the demo, as 100.000.000 records (let's call it data of ~1 week) have ~400MB. I think the main problem with the demo is that queries take > 5 minutes to complete and Superset shows timeouts. The historical pod suspiciously uses exactly one core of cpu and the queries are really slow for a "big data" system IMHO. This could be because either druid is only using a single core or because we dont set any resources (yet!) and the node does not have more cores available. Going to reasearch that. Actions:druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"3g","freeSpacePercent":"5.0"}]
Review Checklist
Once the review is done, comment
bors r+
(orbors merge
) to merge. Further information