openbudgets / platform

Tracking issues related to the working around the OpenBudgets.eu platform (WP4).
GNU General Public License v3.0
1 stars 0 forks source link

Outlier detetion FQR works, but with the same outlier score, dots are too small #65

Open HimmelStein opened 7 years ago

HimmelStein commented 7 years ago

this link was used http://apps.openbudgets.eu/cube/analytics/bonn-budget-2019__40559/outlier_detection/FQR?BABBAGE_FACT_URI=http%3A%2F%2Fapps.openbudgets.eu%2Fapi%2F3%2Fcubes%2Fbonn-budget-2019__40559%2Ffacts%3F&coloringAttribute=businessArea.prefLabel&groupingAttribute=functionalClassification.prefLabel

jaroslav-kuchar commented 7 years ago

For the frequent pattern based outlier detection algorithm (FQR?) - it is important to properly set the parameters. If it is possible try to decrease the minimum support parameter. I am not sure if it is possible to change any parameter in the UI.

HimmelStein commented 7 years ago

@jaroslav-kuchar can you click the above link, and see the result? meanwhile, try to choose a Bonn dataset, select parameters on the left side, and see the visualization result on the right side.

jaroslav-kuchar commented 7 years ago

When I click on the link above, I can only see the following message - "Error: We are sorry, the analysis process did not finish in timely manner". I also tried to start the analysis from scratch but I do not see any possibility to change any parameter.

HimmelStein commented 7 years ago

the error of timing also happens to other data mining tools, like LOF

larjohn commented 7 years ago

The timeout issue is related to the length of the dataset. Most probably, the datamining algorithms are not fed correctly, or they take to much time to finish the task. This happened recently because I removed the pagesize=30 default value, which meant that only 30 lines of data would be fed to the algorithms. Now, all data lines are sent, by default, At this moment, probably the indigo version is the previous one, and it works, but we have to resolve this first, before moving on: https://github.com/openbudgets/integration/issues/14

Regarding the dots issue, it is the real relation of the data. 173 million vs 100 thousand is like that. Moreover, if I am not mistaken, I have used the algorithm from Pierro, which, I think used a square root normalizing in the size of the circles. Would you suggest log or sth else? Automatically selected?