Closed anskarl closed 4 years ago
Commit 7c0f737 implements the aforementioned changes. An outline of the changes is given below:
DruidResponse
is now a sealed trait and has two implementations DruidResponseTimeseriesImpl
(for timeseries, group-by, top-n and select queries) and DruidResponseScanImpl
(only for scan queries)
DruidResponseTimeseriesImpl
contains a list of DruidResult
which can be mapped to user-defined case classesDruidResponseScanImpl
contains a list of DruidScanResults
, each one having a list of individual DruidScanResult
which can be mapped to user-defined case classesDruidResponseSearch
which does not extent DruidResponse
, but provides similar API. The reason behind this is that the results of search queries have a specific format that does not depends on the schema of the datasource and therefore there is no reason to be mapped to user-defined case classes. Furthermore, for that reason the query functions of DruidQuery
trait have been moved to a separate sealed trait DruidQueryFunctions
and it is not extended by SearchQuery
.
Scruid at the moment supports aggregation queries (timeseries, group-by and top-n). It would be also useful to extend the functionality of the library to support Select, Scan and Search queries.
While it is straightforward to implement such queries in Scruid, the format of the resulting data is different and cannot be handled by the current implementation.
Specifically, the format of the resulting data for timeseries and group-by queries is like below:
It is an array of JSON structures, each one is composed of a timestamp and a result which is a JSON structure.
The format of top-n queries is slight different, each time-stamped row contains a result which is an array of JSON structures:
The resulting data (array of JSON structures) of any aggregation query (timeseries, group-by and top-n) is handled by the class
ing.wbaa.druid.DruidResponse
and eachresult
(array or not) is represented by the classing.wbaa.druid.DruidResult
.Select queries return raw Druid rows and support pagination. The format of the resulting data is close to the aggregation queries, an array of JSON objects with timestamp and a result which is a JSON structure:
The only difference is that the
result
structure contains an array of events, therefore it requires a different implementation ofing.wbaa.druid.DruidResponse
.Scan queries do not support pagination like Select queries, but are more efficient and return rows in streaming mode. Regarding the format of the result, compared to aggregation queries, it does not contain a timestamp but the
segmentId
. The timestamp, however, can be retrieved by the inner event structures. Below is an example fragment of the resulting data of a scan query:Furthermore, scan queries can return data in different format (compacted list) and also have a legacy mode for the
timestamp
dimension, in whichtimestamp
is being replaced by the__time
dimension --- for details see official documentation.Search queries return dimension values that match the search specification. The format is close to top-n queries, timestamp field and result is an array of JSON structures.
The main difference here is that the format of the JSON structures in
result
is always composed of the same fields, that isdimension
, itsvalue
and the correspondingcount
. So the issue here is thatlist[T]
andseries[T]
functions ofing.wbaa.druid.DruidResponse
can only be applied to any class having those three particular fields. I think, however, that for practical reasons it is better to havelist
andseries
functions without type parameters and return some predefined class with those fields.With respect to the aforementioned issues, in order to support Select, Scan and Search queries,
ing.wbaa.druid.DruidResponse
anding.wbaa.druid.DruidResult
have to be adapted, as well as apply minor changes toing.wbaa.druid.client.DruidClient
anding.wbaa.druid.client.DruidResponseHandler
.