teragrep / pth_06

Teragrep Datasource for Apache Spark
GNU Affero General Public License v3.0
0 stars 5 forks source link

Implement datasource metrics #75

Open kortemik opened 2 weeks ago

kortemik commented 2 weeks ago

Description Implement datasource metrics

Use case or motivation behind the feature request Currently users do not know the progress of a query which is frustrating. This needs to be fixed. Now that the datasource uses Spark 3 APIs it is possible to provide metric information about the datasource progress.

Please create at least following metrics aggregated into JSON data format: Driver:

Task

Please consider implementing a precreated (hourly/automatic) buckets within the driver for earliest-latest span and binning the processed data in the tasks into these created buckets.

Please define JSON schema once initial development is done.

Related issues https://github.com/teragrep/ajs_01/issues/70 depends on this

Additional context See example at #74 and close when implemented.

kortemik commented 2 weeks ago

this feature replaces "metricsLogger" in DPLParserCatalystContext on pth_10.