sdmx-twg / vtl

This repository is used for maintaining the SDMX-VTL specification
11 stars 7 forks source link

Proposal for time series functions #366

Closed capacma closed 1 year ago

capacma commented 7 years ago

Issue Description

Proposal of time series functions for VTL 2/.0

Proposed Solution

Proposal: VTL 2.0 RM time series functions.docx

capacma commented 7 years ago

List of time series functions

fill_time_series
Updated to the new style of the reference manual frequency frequency returns the frequency of the time period dimension of a data point. It is used in data validation to check that, e.g., monthly data are received for a monthly data collection: check ( frequency ( ds ) = "M" ) This is a new function with respect to VTL 1.1. string_to_time_period
string_to_time_period has been described in #245 to allow custom time period formats. This is a new function with respect to VLT 1.1. timeshift
timeshift has been updated. It works with components of type time_period. The format used by timeshift is the same defined in the VTL 1.1 timeshift (year- month-day). Time aggregate functions
Updated to the new style of the reference manual

capacma commented 7 years ago

It is proposed to remove the two operators described in VTL 1.1 flow_to_stock and stock_to_flow because redundant (note: they were not present in VTL 1.0). If needed by a particular application, two VTL equivalent operators can be defined as follows:

define function flow_to_stock ( ds as dataset { identifier id date } ) is sum ( ds ) over ( order by id data points between unbounded preceding and current data point ) ;

define function stock_to_flow ( ds as dataset { identifier id date } ) is ds - lag ( ds , 1 ) over ( order by id ) ;

capacma commented 7 years ago

It is important to note that the time series operators use the VTL time_period data type instead of the VTL date data type, for the following reasons: • A value of the VTL type date contains a date and a time (see the doc "data types guide") e.g. a valid value could be "2000-01-01 12:00:00". The date type is suitable to store a timestamp and should be used to this purpose, but it is not suitable to use it to store the time period information of the time series. For example, suppose that the user defines a dataset containing daily data, with only one dimension D (just to make it simple). It is clear that the dataset should contain 1 observation per day to assure that there are no duplicated data. This simple constraint cannot be checked if the data type of D is date because the dataset can contain several observation for same day. • The date type does not contain information on the frequency of the time period – it is only a timestamp Note that it is possible to store a date (without the time information) in a time_period and maintain the same format yyyy-mm-dd that one would use for storing a date. This is the proposed solution in the general case to be compatible with several approaches.

bellomarini commented 7 years ago

@capacma some notes for today's discussion.

capacma commented 6 years ago

Latest version 28 Nov 2017 VTL 2.0 RM time series functions MC 28.11.2017.docx

capacma commented 6 years ago

Type system (section of the User Manual) Document by Vincenzo: VTL2.0 - User Manual - The VTL Data types.docx Comments by Maurizio: VTL2.0 - User Manual - The VTL Data types MC.docx

linardian commented 1 year ago

Refers to old version of documentation