openbudgets / DAM

OBEU Data Analysis and Mining repository
3 stars 1 forks source link

caching data-mining results #14

Closed HimmelStein closed 6 years ago

HimmelStein commented 6 years ago

data-mining algorithms may encounter timeout issues. As datasets on the server are stable and do not change very often, we would like to cache data-mining results of the data-mining algorithms at our server.

please send me the data-mining results (in json file) of your developed algorithm (applied for the dataset on the sever), the json file name shall be the dataset name (or key words which are sufficient for a one-to-one mapping to a file name from Rudolf).

@vojir please inform related UEP colleagues with this issue.

larjohn commented 6 years ago

The data mining results should not be pre-calculated. Instead, they should be calculated on demand and then stored in the cache.

Additionally, if you think there are very common use cases for each dataset (you can monitor the requests to decide that), you can create a script that runs when the docker image is started and queues the requests for each dataset.

wk0206 commented 6 years ago

@larjohn we have a problem when caching the result, some part of the link is uncertain. such as http://localhost:5000/outlier_detection/LOF?BABBAGE_FACT_URI=http://apps.openbudgets.eu/api/3/cubes/bonn-budget-2019__40559/facts

The "__40559" is for certain for dataset "Bonn budget 2019", or it change time by time?(in my test , it changed) If this number is random , does that mean, it is a certain format , that" two underscodes puls 5 characters"

you can find such pattern in many sample links here is another click to check "__ceed0"

larjohn commented 6 years ago

@wk0206 the name is constructed as such:

  1. You have the initial dataset URI, eg http://data.openbudgets.eu/resource/dataset/armenia-test-18-09-2017

  2. The last part of the URI is only kept: armenia-test-18-09-2017

  3. The whole URI is md5'd: bfab7aed4f732b7b737d754614d0b8de

  4. The first 5 characters of step 3 are only kept: bfab7

  5. The results from steps 2 and 4 are joined: armenia-test-18-09-2017__bfab7

wk0206 commented 6 years ago

@larjohn I have finished LOF and FQR caching, ae136ce0348c9aaef2335b595cdacbb2f8650280 please have test and confirm it is correct, and then I will put this method to other mining algorithm.

HimmelStein commented 6 years ago

well done!