Closed HimmelStein closed 6 years ago
The data mining results should not be pre-calculated. Instead, they should be calculated on demand and then stored in the cache.
Additionally, if you think there are very common use cases for each dataset (you can monitor the requests to decide that), you can create a script that runs when the docker image is started and queues the requests for each dataset.
@larjohn we have a problem when caching the result, some part of the link is uncertain. such as http://localhost:5000/outlier_detection/LOF?BABBAGE_FACT_URI=http://apps.openbudgets.eu/api/3/cubes/bonn-budget-2019__40559/facts
The "__40559" is for certain for dataset "Bonn budget 2019", or it change time by time?(in my test , it changed) If this number is random , does that mean, it is a certain format , that" two underscodes puls 5 characters"
you can find such pattern in many sample links here is another click to check "__ceed0"
@wk0206 the name is constructed as such:
You have the initial dataset URI, eg http://data.openbudgets.eu/resource/dataset/armenia-test-18-09-2017
The last part of the URI is only kept: armenia-test-18-09-2017
The whole URI is md5'd: bfab7aed4f732b7b737d754614d0b8de
The first 5 characters of step 3 are only kept: bfab7
The results from steps 2 and 4 are joined: armenia-test-18-09-2017__bfab7
@larjohn I have finished LOF and FQR caching, ae136ce0348c9aaef2335b595cdacbb2f8650280 please have test and confirm it is correct, and then I will put this method to other mining algorithm.
well done!
data-mining algorithms may encounter timeout issues. As datasets on the server are stable and do not change very often, we would like to cache data-mining results of the data-mining algorithms at our server.
please send me the data-mining results (in json file) of your developed algorithm (applied for the dataset on the sever), the json file name shall be the dataset name (or key words which are sufficient for a one-to-one mapping to a file name from Rudolf).
@vojir please inform related UEP colleagues with this issue.