Modified tag historian API for discussion

pvallone commented 5 years ago

Just some quick modifications removing some of the routes we discussed, and adding continuation tokens to the query route.

I don't plan on submitting this; just putting this up here for discussion and information sharing.

I know SLC has a route for inserting values, but the historian works behind the scenes on SLS, so I wouldn't want that to be a public route which we document in swagger.

Removing routes which reference individual values by ID
Removing the ability to annotate values with properties
Removing skip/totalCount in favor of continuation tokens

gergo-papp commented 5 years ago

These are just some ideas/comments:

We could make the insert route a cloud-specific one. And we could even add possible limitations in the swagger document too.
In the future we could add explanation about decimation and document how we actually do it (random, first, min-max... or some other weird smart decimation). Or even add an option for selecting teh kind of decimation (e.g. choose between fast or accurate decimation)

pvallone commented 5 years ago

We could make the insert route a cloud-specific one. And we could even add possible limitations in the swagger document too.

We could make it cloud only. Do you folks expect users to call that route? I would have thought you would expect folks to push updates to the tag service's existing routes for updating tag values, and then have your service automatically call the insert-values route on behalf of your users if the modified tag(s) have retention enabled. That's sort of how the existing SLS historian works.

In the future we could add explanation about decimation and document how we actually do it (random, first, min-max... or some other weird smart decimation). Or even add an option for selecting teh kind of decimation (e.g. choose between fast or accurate decimation)

Right, in my email I had suggested we provide a "decimationStrategy" input on the query route. The options could be "RANDOM_SAMPLING" and "FULL_SCAN". "FULL_SCAN" would be the "smart" approach we'd discussed, where we return the entry/exit/min/max for each chunk of data that we have in the window. I had suggested we document that the service will consider the requested decimation strategy a suggestion, and fall back to random sampling if there's too much data to decimate with full scan decimation in the user's requested window. "RANDOM_SAMPLING" could be just plain old random sampling as you folks are doing now, or it could sample the buckets and return entry/exit/min/max from each sampled bucket. This would make it a little more accurate at no meaningful expense to performance (especially if entry/exit/min/max is precalculated on the bucket level). Michael and Fred had thought that this sounded like a good idea. I'd be curious to get @tschmittni's take on this.

tschmittni commented 5 years ago

We could make it cloud only. Do you folks expect users to call that route?

No, this route is not supposed to be called by users. We certainly could provide it because it is protected just as any other route. But this basically makes changes/optimizations a lot harder. And I don't really see a use-case for this route. Customers are supposed to call update-tag-value on the tag-service instead.

about decimationStrategy

I really like this feature, but I don't think we should add any decimation strategy which we cannot support in a performant way. This will just cause a lot of problems. We should just add APIs which can be implemented to respond in the ms range and which scale with the number of data points.

I outlined here https://github.com/ni-kismet/helium-taghistorianservices/pull/171 a couple of options to support min/max/avg/etc... (FULL_SCAN) decimation strategies in a way which allows us to implement it in a performant and extensible way.

My suggestion is to perform decimated history queries based on pre-computed values instead of querying across all the data and compute on-the-fly or calculating values during insert. We just run a background job to calculate the values for our supported decimation strategies and store them so we can highly optimize our read queries.

This has the following advantages:

Insert performance is not affected and we can still ingest huge amounts of data
Decimated queries are highly optimized and can return data within milliseconds
We can support different strategies (and not limit ourselves to the decimations which our database supports)

Tradeoffs:

Background job needs to be implemented (which we anyhow already have for deleting old values)
We either have to disallow inserting values in the past or we need to keep track which documents have been modified

_FULLSCAN

Maybe we can come up with a different name for the FULL_SCAN decimation strategy? :)

pvallone commented 5 years ago

No, this route is not supposed to be called by users.

Then it's probably best to keep it out of the swagger document, yeah. If we don't document it, we can break it.

We can support different strategies (and not limit ourselves to the decimations which our database supports)

That's a big win over doing all of the decimation in the schema at insert time as I had suggested.

Decimated queries are highly optimized and can return data within milliseconds

I like the idea of doing decimated queries against precomputed values so that all reads can complete in milliseconds. When I was prototyping on SLS, the fact that the decimated query API lets you specify an arbitrary decimation "number" made it hard to efficiently do queries, even against precomputed values. What were you planning on precomputing to efficiently handle decimated queries? We can precompute entry/exit/min/max at the minute, hour, day, week, month, year, etc. level, but it still might take a while to break that into the appropriate number of chunks based on the specified decimation value that the user provided. I guess we can wait and see what decimated queries against precomputed values look like in SLC with the background thread approach, and how they perform, but based on what I was seeing when I was prototyping on SLS I think we're still going to have multi-second decimated query times over windows saturated with lots of data even if we precompute aggregates.

If we instead changed the "decimation" query parameter from an integer to an enum with values like MINUTES, HOURS, DAYS, WEEKS, MONTHS, then we could handle queries much more efficiently. InsightCM's tag historian used to do this (and they might still). That would be a more substantial API change though.

We either have to disallow inserting values in the past or we need to keep track which documents have been modified

I think we need to support inserting old values, as Mark is going to want folks to be able to migrate from old NI products like Citadel to the tag historian. We would also need to insert old values when migrating folks from the SLS historian to the SLC historian.

Maybe we can come up with a different name for the FULL_SCAN decimation strategy? :)

Gergo had been calling it "accurate" decimation, and I kind of like that.

tschmittni commented 5 years ago

If we instead changed the "decimation" query parameter from an integer to an enum with values like MINUTES, HOURS, DAYS, WEEKS, MONTHS, then we could handle queries much more efficiently.

couldn't we do this implicitly behind the scenes? basically depending on range size between start and end time (or the first and last data point) we could calculate whether we pick our monthly, daily or hourly pre-computed decimated values...

I think we need to support inserting old values, as Mark is going to want folks to be able to migrate from old NI products like Citadel to the tag historian.

We could provide specific migration APIs for that...

pvallone commented 5 years ago

If we instead changed the "decimation" query parameter from an integer to an enum with values like MINUTES, HOURS, DAYS, WEEKS, MONTHS, then we could handle queries much more efficiently.

couldn't we do this implicitly behind the scenes? basically depending on range size between start and end time (or the first and last data point) we could calculate whether we pick our monthly, daily or hourly pre-computed decimated values...

Yes, but it might need to be smarter than just how much time is within the window. If a user only wrote one value per hour for a year, and specified a decimation value of "1000" and a window size of half a year, then we probably wouldn't want to decimate at the day level, for instance. When I was prototyping on SLS, I used a heuristic based on the approximate number of distinct seconds which we had data stored for within the window. I kept track of the number of distinct seconds which had data on each hour document using a "count" field. To decide how to decimate the data, I would ask Mongo to sum up the "count" field of all of the documents within the window. Based on the value of the "count" sum, I would decide whether or not to decimate at the minute or hour level, with the idea being that if there was too much second-level data in the window, I would fall back to hour-level decimation. We could fall back to day level decimation etc. in that way too.

My main point was that, even if we're precomputing aggregates at the minute, hour, day, week, etc. level, we're still going to have to break the precomputed aggregates for the window into "decimation" number of chunks, and calculate the entry/exit/min/max value for each chunk service-side. I suspect we'll still have situations where queries can take multiple seconds to complete in this manner due to this overhead and the fact that we can't precompute all possible queries for all possible "decimation" values. That's what I was seeing when I prototyped this on SLS, at least. We can mitigate this by caching decimated queries. Maybe it's best to wait and see how things perform in Elixir.

We could provide specific migration APIs for that...

That would be fine with me. As long as there's a way for customers to get old data into the historian, and for me to migrate SLS historian data in to the SLC historian, then we're good.

tschmittni commented 5 years ago

We could fall back to day level decimation etc. in that way too.

Yeah, good point. I would say we just select the granularity (e.g. HOUR or DAY) based on the size of the range between first and last data point (within users' start/end range). Then we perform the decimation and if we don't have enough data points, we just fall back to a more granular level.

calculate the entry/exit/min/max value for each chunk service-side.

If we can efficiently query for the pre-decimated data points, I don't think calculating server-side min/max values is the actual problem. This should be rather fast.

pvallone commented 5 years ago

Then we perform the decimation and if we don't have enough data points, we just fall back to a more granular level.

Sounds reasonable.

I don't think calculating server-side min/max values is the actual problem. This should be rather fast.

Yeah, now that I think about it, most of my query time was spent querying and projecting the decimated values out of Mongo, not calculating the chunk min/max etc. service-side.

pvallone commented 5 years ago

@tschmittni I'll modify the swagger doc with what I think we proposed at the spec review:

query-history (undecimated) takes one path and no longer supports CSV exports
query-decimated-history continues to support multiple paths and no longer supports CSV exports
A new CSV download route is created which does not support continuation tokens. This route supports querying for multiple tags, and it streams a single CSV file to the client.

pvallone commented 5 years ago

I modified the swagger doc with changes from the spec review.

@tschmittni @spanglerco: I haven't modified the doc to break out undecimated and decimated queries in the web socket API. Do we want to split them out? That would sort of explode the web socket API, which is already pretty dense. We would need a lot of new types and actions:

HistoricalTagDecimatedQuery
HistoricalTagUndecimatedQuery
Web Socket Decimated Query Request
Web Socket Undecimated Query Request
Web Socket Async Decimated Query Request
Web Socket Async Undecimated Query Request

If we kept with the web socket API's current pattern of echoing back the query objects, we would also need:

Web Socket Decimated Query Response
Web Socket Undecimated Query Response
Web Socket Async Undecimated Query Response
Web Socket Async Decimated Query Response
HistoricalTagAsyncDecimatedQueryJob
HistoricalTagAsyncUndecimatedQueryJob

tschmittni commented 5 years ago

Do we want to split them out?

I would vote for keeping them in-sync... otherwise it might get really confusing

tschmittni commented 5 years ago

I'll modify the swagger doc with what I think we proposed at the spec review:

thank you!

pvallone commented 5 years ago

I would vote for keeping them in-sync... otherwise it might get really confusing

I split the async web socket query routes out. Paul's feeling was that nobody is using the synchronous web socket query API, so I removed those routes. I wasn't sure if we wanted the async undecimated query route to support querying for multiple tags or not. Since it's async, it seems like it could, so it currently does in my branch.

pvallone commented 5 years ago

@tschmittni Mark and Fred think that we should have separate swagger docs for the cloud and server historians, at least until both services have the same API. We'd like to check this swagger doc in (I renamed the file to nitaghistorian-cloud.yml) and pull it into api.systemlinkcloud.com. Could you approve this PR if you're satisfied with the API?

ni / systemlink-OpenAPI-documents

Modified tag historian API for discussion #53