varchar-io / nebula

A distributed block-based data storage and compute engine
https://nebula.bz
Apache License 2.0
154 stars 18 forks source link

Custom time query #182

Closed caoash closed 2 years ago

caoash commented 2 years ago

https://github.com/varchar-io/nebula/issues/180

The git history and code are messy. I ran tests for creating timeline query by day and week since you can do those with constant bucket sizes. An issue that arises is that the table below the timeline will show a negative value for the initial time because the start of a day/week may be before beginTime.

To implement month/year (which needs consideration of leap years / 29, 30, 31 days), likely need to write a UDF. I'm looking into it but still a bit confused on how to do this. Any formatting / code structure feedback is appreciated.

caoash commented 2 years ago

Fixed the history with rebase

caoash commented 2 years ago

implemented the UDF that I want to apply to the ColumnExpression in QueryHandler.cpp, but I still can't figure out how to

caoash commented 2 years ago

How do you want to handle the behavior with table having negative time points?

So the behavior comes from rounding sometimes needing to have a time point before beginTime. For EX imagine the query 3/14 as beginTime and you want it to bucket by month. The first time point should be 3/1.

Ideas I have are:

  1. make the first time point 0 no matter what, though that leads to non intuitive bucket sizes. For EX if I want to create buckets by monthly, the first date will be the beginTime of the query (like 2/12 for example) and then the second one will be correctly rounded (3/1).
  2. Make some change on the frontend that just increments everything by the minimum time point, so it looks better on the frontend while still rounding correctly.
  3. Change the query so that beginTime will always match the timeline boundary. EX: if I query monthly, beginTime will be rounded to 3/1 rather than 3/14.
shawncao commented 2 years ago

I see this is a good problem, thanks for the explanation, @caoash

I like option 1 in your proposal, here are my thoughts:

Nebula as a backend engine/database, should not care how frontend will use it, but it has to simply let clients know how its behaviors are consistent, for this reason, we will rule out option 2.

For a similar reason, changing the client's query is "risky" as the client may not expect that, in fact, in many situations, different clients may expect different results, to solve this type of confusion, the best way is to not do anything, as long as your contract is simple enough, the client side should handle all the different situations.

So for this case, I think to keep the first time point to 0 is the simplest answer across all the queries, regardless of what type of time patterns they run with.

caoash commented 2 years ago

I think rebase was successful