usc-isi-i2 / dig-etl-engine

Download DIG to run on your laptop or server.
http://usc-isi-i2.github.io/dig/
MIT License
101 stars 39 forks source link

Col-Oriented Timeseries metadata cannot be correctly specified #184

Open YiDomiChen opened 6 years ago

YiDomiChen commented 6 years ago

While specifying metadata of column-oriented time-series data like this: "TimeSeriesRegions": [ { "orientation" : "col", "cols" : "[G:G]", "locs" : "[2:25]", "metadata" : [ { "name" : "location", "loc" : "A" } ], "times" : { "locs" : "[F]" } } ], The output result cannot correctly show the metadata of each timeseries. Time series metadata for a column can only be a row in the same column currently.

puuj commented 6 years ago

Long-term_Interest_Rates.csv.txt test-annotation.json.txt test-rawdata.csv.txt

Here's the full context of this issue. I think we need to have a plan for adding functionality like this, since it could quickly make the code very complicated if we must be able to specify arbitrary relational data...

I think your options are either specifying a different TimeSeriesRegion for each set of metadata (a pain, but will work) or propose a new annotation specification and iterate with Amandeep to make sure it makes sense. --Jay

From: Yi Chen chen310@usc.edu Thank you for your advice. But in the entire dataset the value of metadata such as location and frequency actually changes. Attached is the entire dataset. I wonder if there are other solutions. Thank you very much.

On Thu, Jan 11, 2018 at 10:54 AM, Jay Pujara jpujara@isi.edu wrote: Time series metadata for a column can only be a row in the same column (currently). You're trying to get values from other adjacent columns. In this case since those values never change, you can use the global metadata to specify them.

On Thu, Jan 11, 2018 at 10:45 AM, Yi Chen chen310@usc.edu wrote: I've met a problem while dealing with col-oriented time-series data. I tried to specify the metadata of each time-series it just shows the wrong value.

Attached are the test original csv file, annotation json file and output. As you see, the value of "location" should be "PRT" instead of "Value". I'd appreciate it if you can figure the problem out. Thank you very much.

puuj commented 6 years ago

So just to clarify the major issue here: right now metadata are specified at the time series level - which means all observations share the same metadata about the measurements. This change, if done naively, will mean that we will have metadata at the observation level, which will be far, far more verbose. If we try to be intelligent, we can try and aggregate observations by metadata and somehow express the minimum number of time series, but that can become fairly onerous as well. So before blindly implementing something, we should think through what we really want/need, and if there's a way the annotator can help us sidestep the issue.

majidghgol commented 6 years ago

@puuj I have a similar problem for issue#175 (earthquakes). I am not quite sure what our definition of time series are. Any event that has been time-stamped is now interpreted as a time series as far as I understood. It might be better to restrict the time series to have fixed time interval of events (e.g. barrels of oil sold each month, or stock price of a compony each month). This I think leaves out these cases of trying to fit the inherently relational data which happens to be time stamped (e.g. the info about the earthquakes around the world) in time series annotation.