Open wm9947 opened 5 years ago
@wm9947 This is a formatting issue internally. Would you like to change the time format to workaround as shown below?
"%Y-%m-%dT%H:%M:%SZ" -> "%Y-%m-%d %H:%M:%S"
yyyy-MM-ddTHH:mm:ssZ -> yyyy-MM-dd HH:mm:ss
@wm9947 Alternatively, you can change your time format in the api request as follows:
yyyy-MM-ddTHH:mm:ssZ -> yyyy-MM-dd'T'HH:mm:ssZ
※ The post was modified.
Your solutions are works well, and can check datasource detail. But,,,
As a image the timestamp is not shown, and the timestamp column from the downloaded csv file show the column is undefine(by text).
Many thanks,
@wm9947 Hmm.. Can I see the data from the generator?
@kyungtaak
{"category": "30", "timestamp": "2019-05-08T05:34:35Z", "value_02": 150, "value_03": 225, "value_01": 75}
{"category": "10", "timestamp": "2019-05-08T05:34:36Z", "value_02": 94, "value_03": 141, "value_01": 47}
{"category": "20", "timestamp": "2019-05-08T05:34:36Z", "value_02": -192, "value_03": -288, "value_01": -96}
{"category": "30", "timestamp": "2019-05-08T05:34:36Z", "value_02": 154, "value_03": 231, "value_01": 77}
{"category": "10", "timestamp": "2019-05-08T05:34:37Z", "value_02": 76, "value_03": 114, "value_01": 38}
{"category": "20", "timestamp": "2019-05-08T05:34:37Z", "value_02": -196, "value_03": -294, "value_01": -98}
{"category": "30", "timestamp": "2019-05-08T05:34:37Z", "value_02": 158, "value_03": 237, "value_01": 79}
{"category": "10", "timestamp": "2019-05-08T05:34:38Z", "value_02": 56, "value_03": 84, "value_01": 28}
{"category": "20", "timestamp": "2019-05-08T05:34:38Z", "value_02": -196, "value_03": -294, "value_01": -98}
{"category": "30", "timestamp": "2019-05-08T05:34:38Z", "value_02": 162, "value_03": 243, "value_01": 81}
{"category": "10", "timestamp": "2019-05-08T05:34:39Z", "value_02": 36, "value_03": 54, "value_01": 18}
{"category": "20", "timestamp": "2019-05-08T05:34:39Z", "value_02": -198, "value_03": -297, "value_01": -99}
{"category": "30", "timestamp": "2019-05-08T05:34:39Z", "value_02": 166, "value_03": 249, "value_01": 83}
This is what I generate from python code.
@wm9947 The timestamp as a column name is internally reserved. Therefore, there seems to be a problem with the processing of reserved words. :( Can you change the name "timestamp" to something else in generator and api request body?
The part of reserved words will be treated as a separate issue.
※ The post was modified again.
I try to use another generated data such as...
string, timestamp, double(Measure), string, string
When I try to ingest 5 columns which are only one columns is Measure type and others are Dimension, Each row is mostly same and can be ingested again.
But in this case, the Measure row has been automatically merged. ex) The measure column contain the number of the row such as 1,2,3,4,5,..... Other columns are same data. -> The ingested result in the Metatron discovery show a column which is the Measure data has been summed.
If I change to Dimension from measure, rows are separated even ingest the same data.
Sorry for not providing capture file. If you cannot understand what I mean, please let me know.
Many thanks,
If you cannot understand what I mean, please let me know.
@wm9947 This is an issue of an option called rollup.
The concept of "rollup" is based on druid. Druid can summarize raw data at processing time using roll-up options. A rollup is a primary aggregation operation on a selected set of columns that reduces the size of the stored segment. We also use the roll-up option to improve the performance of some query operations. However, if the data in each row is meaningful, you can set the rollup option to false and ingest. In fact, most usability is in this case, so we changed the default to false as shown below.
In api, you can set up as follows.
...
"ingestion": {
"type": "realtime",
...,
"rollup": true
}
...
Regarding "rollup", It would be better to check the contents of the link below.
@kyungtaak Thanks for your kind :) I just miss to set the rollup option.
Additionally, I got some information about type when I make datasource by json POST. Such as, DataType : WKT, BOOLEAN, NUMBER, UNKNOWN, TEXT, DECIMAL, STRUCT, TIMESTAMP, ARRAY, FLOAT, INTEGER, STRING, MAP, DOUBLE, LONG logicalType: POSTAL_CODE, GEO_POINT, HTTP_CODE, LNG, SEX, BOOLEAN, NUMBER, GEO_POLYGON, DISTRICT, URL, DOUBLE, LNT, NIN, STRUCT, TIMESTAMP, ARRAY, PHONE_NUMBER, MAP_KEY, INTEGER, GEO_LINE, MAP_VALUE, STRING, IP_V4, EMAIL, CREDIT_CARD
Do you have any document for these format?
I try to ingest WKT point data, but I don't know which format is possible ex) 10.111,20,111 or POINT(10.111 20,111) or (10.111 20,111) or (10.111, 20,111)
Many Thanks,
@wm9947 First, a description of the WKT representation is given here : https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry That is, "POINT(10.111 20.111)" is the correct WKT.
You can set datatype, logical type for geometry related columns as below.
{
...,
"fields": [
...,
{
"name": "point_location",
"type": "STRING",
"logicalType": "GEO_POINT", // WKT : POINT
"role": "DIMENSION",
"seq": 1
},
// or
{
"name": "line_location",
"type": "STRING",
"logicalType": "GEO_LINE", // WKT : LINESTRING, MULTILINESTRING
"role": "DIMENSION",
"seq": 1
},
// or
{
"name": "polygon_location",
"type": "STRING",
"logicalType": "GEO_POLYGON", // WKT : MULTILINESTRING, POLYGON, MULTIPOLYGON
"role": "DIMENSION",
"seq": 1
}
]
}
※ More detailed data type related information will be described in the api document.
@kyungtaak I use
{
"name": "Poin",
"type": "STRING",
"logicalType": "GEO_POINT",
"role": "DIMENSION",
"seq": 4
},
My code generate the string as blow
'Poin': 'POINT (33.1000000000000001 100.2020000000000001)'
It generated by
cur_result['Poin'] = wkt.dumps({'type': 'Point', 'coordinates': [la, lo ] } );
But the column data is missing.
Many thanks,
@wm9947 I'm sorry for the late reply.;; There is one thing I missed. You must add the "format" property as shown below.
{
"name": "GeoPoint",
"type": "STRING",
"logicalType": "GEO_POINT",
"role": "DIMENSION",
"seq": 1,
"format": {
"type": "geo_point"
}
}
@wm9947 I'm sorry for the late reply.;; There is one thing I missed. You must add the "format" property as shown below.
{ "name": "GeoPoint", "type": "STRING", "logicalType": "GEO_POINT", "role": "DIMENSION", "seq": 1, "format": { "type": "geo_point" } }
@kyungtaak I have the same issue. Data format example:
{"code":"LA","coords":"POINT (51.42205894062455 13.747762979057024)","country":"Lao People’s Democratic Republic","lat":51.42205894062455,"lng":13.747762979057024,"point":"POINT (18.0 105.0)"}
{"code":"MD","coords":"POINT (47.666765962573955 28.766420312548796)","country":"Republic of Moldova","lat":47.666765962573955,"lng":28.766420312548796,"point":"POINT (47.25 28.58333)"}
{"code":"KH","coords":"POINT (48.5273681970357 10.11163636642112)","country":"Kingdom of Cambodia","lat":48.5273681970357,"lng":10.11163636642112,"point":"POINT (13.0 105.0)"}
{"code":"CZ","coords":"POINT (49.52874343867825 15.736932294356984)","country":"Czechia","lat":49.52874343867825,"lng":15.736932294356984,"point":"POINT (49.75 15.0)"}
{"code":"PL","coords":"POINT (50.583994227408205 22.263452104234823)","country":"Republic of Poland","lat":50.583994227408205,"lng":22.263452104234823,"point":"POINT (52.0 20.0)"}
Ingestion without "format": { "type": "geo_point" }
gives:
"event_time","lat","lng","coords","country","code","point"
"2020-10-19T09:53:31+0000","47.35356695410582","22.172929617427158","undefined","Hungary","HU","POINT ( )"
"2020-10-19T09:53:31+0000","46.277530657010686","20.106731262627182","undefined","Hungary","HU","POINT ( )"
"2020-10-19T09:53:31+0000","47.6556328533538","11.093962729708942","undefined","Principality of Liechtenstein","LI","POINT ( )"
"2020-10-19T09:53:31+0000","48.477095480336445","13.16176565938561","undefined","Republic of Austria","AT","POINT ( )"
"2020-10-19T09:53:31+0000","48.24618466276453","14.400907426432358","undefined","Republic of Austria","AT","POINT ( )"
"2020-10-19T09:53:31+0000","51.31381653688641","26.551735998073084","undefined","Republic of Belarus","BY","POINT ( )"
"2020-10-19T09:53:31+0000","49.85165022604589","27.74231184335249","undefined","Republic of Moldova","MD","POINT ( )"
"2020-10-19T09:53:31+0000","47.91721363600633","27.319068781645235","undefined","Republic of Moldova","MD","POINT ( )"
"2020-10-19T09:53:31+0000","50.55432969445604","22.70753745067759","undefined","Republic of Poland","PL","POINT ( )"
"2020-10-19T09:53:31+0000","50.91003915720814","21.90844270496809","undefined","Republic of Poland","PL","POINT ( )"
Ingestion with a "format": { "type": "geo_point" }
fails with:
2020-10-19 12:36:41.306 ERROR [127.0.0.1-admin] [http-nio-8180-exec-6] a.m.d.c.exception.RestExceptionHandler : [API:/api/datasources] GB0001 null: NullPointerException:
app.metatron.discovery.common.exception.UnknownServerException
at app.metatron.discovery.common.exception.RestExceptionHandler.handleMiscFailures(RestExceptionHandler.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Any additional ideas?
https://metatron.app/2018/08/02/visualize-real-time-data-with-metatron-discovery/ I use this post to real-time ingest.
My python code is slightly different from what you provide
I successfully ingest data to Metatron-discovery and possible to make a real-time dashboard as you describe. But when I want to see the ingested data which is provided in datasource page, the error message has been shown.
Could you check it?
Many thanks,