marcelmay / hadoop-hdfs-fsimage-exporter

Exports Hadoop HDFS content statistics to Prometheus
Apache License 2.0
152 stars 45 forks source link

[Request] Health Check for Last FSImage Loading #222

Closed spearkkk closed 2 weeks ago

spearkkk commented 2 weeks ago

Thank you for nice fsimage exporter. Im trying to use this exporter for fsimage. while trying, i found that there is error log for being killed OOM to load huge fsimage with less heap memory size. That time, there is only error log and there is no other way to inform or notice for this.

So, could you consider adding health checker interface? or adding anyway to alert this status to developer?

marcelmay commented 2 weeks ago

Hi @spearkkk !

You can use Prometheus also for monitoring the fsimage exporters own health:

1) Monitor and alert on if service is up in Prometheus: up({job="fsimage-exporter",...})==0 2) Monitor andd alert on exporter memory (e.g. llike this) and tune the exporter JVM heap sufficiently.

If you run the exporter containerized on e.g. k8s, use a liveness probe on the exposed port.

spearkkk commented 2 weeks ago

@marcelmay First of all, thank you for your guideline. I tried to health check to / or /metrics when app cannot load fsimage because of heap memory issue. But application still reply well request(/, /metrics). I mean, java application doesn`t exit with heap memory issue because FsImageLoader handles the exception with only logging and the thread is killed. Could you check about OOM killed for FsImageLoader?

marcelmay commented 2 weeks ago

What about configuring the JVM with additional flag -XX:+ExitOnOutOfMemoryError via JAVA_OPTS?

spearkkk commented 2 weeks ago

@marcelmay Thank you!! Really appreciate. you saved my time