Closed larjohn closed 7 years ago
@larjohn "This is done using the OS-compatible API, and specifically, the cubes & model endpoints." If it is done, please send me the link. otherwise, i would suggest that we create an endpoint like /cubes/<name>/<algorithm>
, which returns a json structure about <algorithm>
for dataset <name>
and description of <algorithm>
@larjohn for the endpoint of detailed algorithm, we create endpoints with the format /cubes/<name>/algo/<algo_name>
for the example above, /cubes/<name>/algo/dummyTimeSeries/
shall return a json structure as follows.
`
{
"algorithm": {"title": 'Time Series',
"name": 'time_series'
"instance": 'timeSeriesAlgorithm',
"method": "POST",
"endpoint": ["DAMUrl", '/library/TimeSeries.OBeu/R/open_spending.ts'],
"prompt": "Select an aggregate, a time-related drilldown and the prediction steps parameter from the left and click on the execute button on top right."
},
"input":{ "raw_data": { "cardinality": "1", "type": "BABBAGE_AGGREGATE_URI", "name" : 'json_data', "title": 'Data coming from an aggregation', "guess" : false, "required": true }, "time_dimension" : { "cardinality": "1", "type": "ATTRIBUTE_REF", "name":'time', "title" : "Time dimension", "guess": true, "required": true }, "amount_aggregate": {"cardinality": "1", "type" : "AGGREGATE_REF", "name": "amount", "title": "Amount aggregate", "guess": true, "required" : true }, "prediction_steps":{"cardinality": "1", "type": "PARAMETER", "name" : "prediction_steps", "title": "Prediction Steps", "data_type": "number", "default_value": 4, "guess" : false, "required" : false }
}, "output": { "name": "output", "instance": "json_outout", "cardinality": "1", "type": "TABLE" }
} `
@larjohn please check the expected json output above.
@app.route('/cubes/algo/<algorithm>', methods=['GET'])
@app.route('/cubes/<dataset>/<algorithm>', methods=['GET'])
@app.route('/cubes/<dataset>/algo', methods=['GET'])
@app.route('/cubes/algo/<algorithm>', methods=['GET'])
is extended to return detailed input/output information of an algorithm
function details are implemented in the preprocessing_dm module at https://github.com/openbudgets/preprocessing_dm
This is done using the OS-compatible API, and specifically, the cubes & model endpoints.
"This" refers to accessing the properties of the cube.
Example:
http://ws307.math.auth.gr/rudolf/public/api/v3/cubes/bonn-budget-2016__6cf09/model
Then you have to determine the applicability. The simplest case is that all algorithms are applicable to all datasets.
Otherwise, the provided output seems to be correct. Do you have a live instance? The link in the previous post does not seem to work.
This link https://github.com/openbudgets/preprocessing_dm. I implemented your dummyTimeSieres
example.
currently, meta information of functions are stored in the two following json files.
https://github.com/openbudgets/preprocessing_dm/blob/master/preprocessing_dm/algo4data.json https://github.com/openbudgets/preprocessing_dm/blob/master/preprocessing_dm/algoIO.json
Is this the correct path to access the endpoint?
/cubes/algo/dummyTimeSieres
yes
Currently,
cubes/budget-katerini-revenue-2016__235c7/algo
returns an empty array.
How can I get the algorithms for a specific dataset? For now, it should contain all algorithms and later we can sort it out.
need to update the content of these two files: https://github.com/openbudgets/preprocessing_dm/blob/master/preprocessing_dm/algo4data.json https://github.com/openbudgets/preprocessing_dm/blob/master/preprocessing_dm/algoIO.json
@larjohn
I updated https://github.com/openbudgets/preprocessing_dm/blob/master/preprocessing_dm/algo4data.json
please run make update_pdm
in your DAM directory as follows.
(env) DAM tdong$ make update_pdm
visit http://localhost:5000/cubes/budget-katerini-revenue-2016__235c7/algo
in the browser, you will see
{
"algos": [
"dummyTimeSeries"
]
}
@larjohn two new endpoints for data-mining
/outlier_detection/LOF/sample
/outlier_detection/LOF/real
/outlier_detection/LOF/sample
does not need any input, the server will uses the file /Data/Kilkis_neu.csv
as input, and produce a csv file at /static/ouput/Result_top25.csv
.
/outlier_detection/LOF/real
shall accept one or more Turtle files as input. Parameters and input of multiple files are described in /cubes/algo/outlierDetection_LOF
Let's make it a bit easier to integrate for the time being. Can you just allow the time series algorithm to work with any dataset?
Also in the algos array, a friendly name should also appear.
@larjohn updated. you need to pull DAM and run make update_pdm
I was able to run the staging_indigo branch and get to the API root endpoint.
Let's now define in detail the interface. We take as granted that the input for the algorithms should be either aggregate or fact responses from an OpenSpending compatible API.
For indigo the following endpoints are needed:
Algorithms applicable to a specific dataset
This endpoint returns the algorithms that are applicable for a specific dataset. In order to determine whether an algorithm can be applied onto a dataset, you might need to access the dataset's metadata first. This is done using the OS-compatible API, and specifically, the cubes & model endpoints.
The result is a JSON list with the algorithm names and descriptions.
Algorithm details, inputs and outputs
This endpoint returns a detailed list of inputs and outputs of an algorithm. One of the inputs is the data source, which preferably is a link to a facts/aggregate result from the OS-compatible API. Other inputs could include parameters and their metadata. The following is an excerpt from the indigo app that currently hard-codes such metadata:
We can follow the same pattern in order to define metadata for each algorithm's inputs and outputs. I wonder if we could also include some kind of restrictions and input validation that is transparently transferred inside the JSON response of this endpoint.
Execution of algorithm
This endpoint is responsible for the actual execution of the algorithm. It should accept the input parameters as GET parameters and then forward them to the algorithm instance, depending on where it is running. If an algorithm can't handle the API URL as a data source, DAM could download the data itself and send these instead. Of course caching is recommended all the way from indigo to the algorithm.