Use case or motivation behind the feature request
Currently users do not know the progress of a query which is frustrating. This needs to be fixed. Now that the datasource uses Spark 3 APIs it is possible to provide metric information about the datasource progress.
Please create at least following metrics aggregated into JSON data format:
Driver:
Current archive offset
Kafka offset
Task
Amount of records processed
Amount of bytes processed
Bytes per second
Records per second
Please consider implementing a precreated (hourly/automatic) buckets within the driver for earliest-latest span and binning the processed data in the tasks into these created buckets.
Please define JSON schema once initial development is done.
Description Implement datasource metrics
Use case or motivation behind the feature request Currently users do not know the progress of a query which is frustrating. This needs to be fixed. Now that the datasource uses Spark 3 APIs it is possible to provide metric information about the datasource progress.
Please create at least following metrics aggregated into JSON data format: Driver:
Task
Please consider implementing a precreated (hourly/automatic) buckets within the driver for earliest-latest span and binning the processed data in the tasks into these created buckets.
Please define JSON schema once initial development is done.
Related issues https://github.com/teragrep/ajs_01/issues/70 depends on this
Additional context See example at #74 and close when implemented.