openbudgets / DAM

OBEU Data Analysis and Mining repository
3 stars 1 forks source link

Expected endpoints definition #3

Closed larjohn closed 7 years ago

larjohn commented 7 years ago

I was able to run the staging_indigo branch and get to the API root endpoint.

Let's now define in detail the interface. We take as granted that the input for the algorithms should be either aggregate or fact responses from an OpenSpending compatible API.

For indigo the following endpoints are needed:

Algorithms applicable to a specific dataset

This endpoint returns the algorithms that are applicable for a specific dataset. In order to determine whether an algorithm can be applied onto a dataset, you might need to access the dataset's metadata first. This is done using the OS-compatible API, and specifically, the cubes & model endpoints.

The result is a JSON list with the algorithm names and descriptions.

Algorithm details, inputs and outputs

This endpoint returns a detailed list of inputs and outputs of an algorithm. One of the inputs is the data source, which preferably is a link to a facts/aggregate result from the OS-compatible API. Other inputs could include parameters and their metadata. The following is an excerpt from the indigo app that currently hard-codes such metadata:

  dummyTimeSeries(): Algorithm {
    let timeSeriesAlgorithm = new Algorithm();
    timeSeriesAlgorithm.title = 'Time Series';
    timeSeriesAlgorithm.name = 'time_series';

    let raw_data_input = new Input();
    raw_data_input.cardinality = "1";
    raw_data_input.type = InputTypes.BABBAGE_AGGREGATE_URI;
    raw_data_input.name = 'json_data';
    raw_data_input.title = 'Data coming from an aggregation';
    raw_data_input.guess = false;
    raw_data_input.required = true;

    let time_dimension_input = new Input();
    time_dimension_input.cardinality = "1";
    time_dimension_input.type = InputTypes.ATTRIBUTE_REF;
    time_dimension_input.name = 'time';
    time_dimension_input.title = "Time dimension";
    time_dimension_input.guess = true;
    time_dimension_input.required = true;

    let amount_aggregate_input = new Input();
    amount_aggregate_input.cardinality = "1";
    amount_aggregate_input.type = InputTypes.AGGREGATE_REF;
    amount_aggregate_input.name = "amount";
    amount_aggregate_input.title = "Amount aggregate";
    amount_aggregate_input.guess = true;
    amount_aggregate_input.required = true;

    let prediction_steps_input = new Input();
    prediction_steps_input.cardinality = "1";
    prediction_steps_input.type = InputTypes.PARAMETER;
    prediction_steps_input.name = "prediction_steps";
    prediction_steps_input.title = "Prediction Steps";
    prediction_steps_input.data_type = "number";
    prediction_steps_input.default_value = 4;
    prediction_steps_input.guess = false;
    prediction_steps_input.required = false;

    timeSeriesAlgorithm.inputs.set(raw_data_input.name, raw_data_input);
    timeSeriesAlgorithm.inputs.set(time_dimension_input.name, time_dimension_input);
    timeSeriesAlgorithm.inputs.set(amount_aggregate_input.name, amount_aggregate_input);
    timeSeriesAlgorithm.inputs.set(prediction_steps_input.name, prediction_steps_input);

    let json_output = new Output;
    json_output.name = "output";
    json_output.cardinality = 1 ;
    json_output.type = OutputTypes.TABLE;

    timeSeriesAlgorithm.outputs.set(json_output.name, json_output);

    timeSeriesAlgorithm.method = RequestMethod.Post;
    timeSeriesAlgorithm.endpoint = new URL(environment.DAMUrl + '/library/TimeSeries.OBeu/R/open_spending.ts');
    timeSeriesAlgorithm.prompt = 'Select an aggregate, a time-related drilldown and the prediction steps parameter from the left and click on the execute button on top right.';

    return timeSeriesAlgorithm;

  }

We can follow the same pattern in order to define metadata for each algorithm's inputs and outputs. I wonder if we could also include some kind of restrictions and input validation that is transparently transferred inside the JSON response of this endpoint.

Execution of algorithm

This endpoint is responsible for the actual execution of the algorithm. It should accept the input parameters as GET parameters and then forward them to the algorithm instance, depending on where it is running. If an algorithm can't handle the API URL as a data source, DAM could download the data itself and send these instead. Of course caching is recommended all the way from indigo to the algorithm.

HimmelStein commented 7 years ago

@larjohn "This is done using the OS-compatible API, and specifically, the cubes & model endpoints." If it is done, please send me the link. otherwise, i would suggest that we create an endpoint like /cubes/<name>/<algorithm>, which returns a json structure about <algorithm> for dataset <name> and description of <algorithm>

HimmelStein commented 7 years ago

@larjohn for the endpoint of detailed algorithm, we create endpoints with the format /cubes/<name>/algo/<algo_name>

for the example above, /cubes/<name>/algo/dummyTimeSeries/ shall return a json structure as follows. ` { "algorithm": {"title": 'Time Series', "name": 'time_series' "instance": 'timeSeriesAlgorithm', "method": "POST", "endpoint": ["DAMUrl", '/library/TimeSeries.OBeu/R/open_spending.ts'], "prompt": "Select an aggregate, a time-related drilldown and the prediction steps parameter from the left and click on the execute button on top right."
},

"input":{ "raw_data": { "cardinality": "1", "type": "BABBAGE_AGGREGATE_URI", "name" : 'json_data', "title": 'Data coming from an aggregation', "guess" : false, "required": true }, "time_dimension" : { "cardinality": "1", "type": "ATTRIBUTE_REF", "name":'time', "title" : "Time dimension", "guess": true, "required": true }, "amount_aggregate": {"cardinality": "1", "type" : "AGGREGATE_REF", "name": "amount", "title": "Amount aggregate", "guess": true, "required" : true }, "prediction_steps":{"cardinality": "1", "type": "PARAMETER", "name" : "prediction_steps", "title": "Prediction Steps", "data_type": "number", "default_value": 4, "guess" : false, "required" : false }

}, "output": { "name": "output", "instance": "json_outout", "cardinality": "1", "type": "TABLE" }

} `

HimmelStein commented 7 years ago

@larjohn please check the expected json output above.

HimmelStein commented 7 years ago

Algorithms applicable to a specific dataset

@app.route('/cubes/algo/<algorithm>', methods=['GET'])
@app.route('/cubes/<dataset>/<algorithm>', methods=['GET'])
@app.route('/cubes/<dataset>/algo', methods=['GET'])
HimmelStein commented 7 years ago

Algorithm details, inputs and outputs

@app.route('/cubes/algo/<algorithm>', methods=['GET'])

is extended to return detailed input/output information of an algorithm

HimmelStein commented 7 years ago

function details are implemented in the preprocessing_dm module at https://github.com/openbudgets/preprocessing_dm

larjohn commented 7 years ago

This is done using the OS-compatible API, and specifically, the cubes & model endpoints.

"This" refers to accessing the properties of the cube.

Example:

http://ws307.math.auth.gr/rudolf/public/api/v3/cubes/bonn-budget-2016__6cf09/model

Then you have to determine the applicability. The simplest case is that all algorithms are applicable to all datasets.

Otherwise, the provided output seems to be correct. Do you have a live instance? The link in the previous post does not seem to work.

HimmelStein commented 7 years ago

This link https://github.com/openbudgets/preprocessing_dm. I implemented your dummyTimeSieres example.

HimmelStein commented 7 years ago

currently, meta information of functions are stored in the two following json files.

https://github.com/openbudgets/preprocessing_dm/blob/master/preprocessing_dm/algo4data.json https://github.com/openbudgets/preprocessing_dm/blob/master/preprocessing_dm/algoIO.json

larjohn commented 7 years ago

Is this the correct path to access the endpoint?

/cubes/algo/dummyTimeSieres

HimmelStein commented 7 years ago

yes

larjohn commented 7 years ago

Currently,

cubes/budget-katerini-revenue-2016__235c7/algo

returns an empty array.

How can I get the algorithms for a specific dataset? For now, it should contain all algorithms and later we can sort it out.

HimmelStein commented 7 years ago

need to update the content of these two files: https://github.com/openbudgets/preprocessing_dm/blob/master/preprocessing_dm/algo4data.json https://github.com/openbudgets/preprocessing_dm/blob/master/preprocessing_dm/algoIO.json

HimmelStein commented 7 years ago

@larjohn I updated https://github.com/openbudgets/preprocessing_dm/blob/master/preprocessing_dm/algo4data.json please run make update_pdm in your DAM directory as follows.

(env) DAM tdong$ make update_pdm

visit http://localhost:5000/cubes/budget-katerini-revenue-2016__235c7/algo in the browser, you will see

{
    "algos": [
        "dummyTimeSeries"
    ]
}
HimmelStein commented 7 years ago

@larjohn two new endpoints for data-mining

/outlier_detection/LOF/sample
/outlier_detection/LOF/real

/outlier_detection/LOF/sample does not need any input, the server will uses the file /Data/Kilkis_neu.csv as input, and produce a csv file at /static/ouput/Result_top25.csv.

/outlier_detection/LOF/real shall accept one or more Turtle files as input. Parameters and input of multiple files are described in /cubes/algo/outlierDetection_LOF

larjohn commented 7 years ago

Let's make it a bit easier to integrate for the time being. Can you just allow the time series algorithm to work with any dataset?

Also in the algos array, a friendly name should also appear.

HimmelStein commented 7 years ago

@larjohn updated. you need to pull DAM and run make update_pdm