nasaharvest / dora

Domain-agnostic Outlier Ranking Algorithms (DORA) - SMD cross-divisional use case demonstration of AI/ML
MIT License
10 stars 3 forks source link

Plot histogram of outlier scores #55

Open hannah-rae opened 3 years ago

hannah-rae commented 3 years ago

It would be helpful for deciding what threshold to use for reviewing outliers to see a distribution of the outlier scores for each algorithm. This module would generate a histogram of outlier scores.

As a side note, #24 specifies the number of outliers to record in the subset. It could also be useful to specify a threshold, though this would be algorithm-dependent (unless the threshold was based on the data, e.g., 2 std deviations or similar).

hannah-rae commented 3 years ago

@vinr515 I think we can close this now, but it would be good for some others to try out the functionality before closing to make sure it works across the use cases.

hannah-rae commented 2 years ago

@vinr515 I tried running this and it works great with no parameters but I get the following error when I use the bins parameter: TypeError: _run() got multiple values for argument 'bins'