wi2trier / cbrkit

Customizable Case-Based Reasoning (CBR) toolkit for Python with a built-in API and CLI.
https://wi2trier.github.io/cbrkit/
MIT License
6 stars 4 forks source link

Unexpected results when using cbrkit.sim.numbers.linear function #166

Open gjimenezUCM opened 1 month ago

gjimenezUCM commented 1 month ago

cbrkit.sim.numbers.linear function is not working properly when min parameter is provided. Example:

simFunction = cbrkit.sim.attribute_value(
    attributes={
        "year": cbrkit.sim.numbers.linear(min=1950, max=2000)
        ...
})

query = {"year": 1980, ...}

In this case, every result has a similarity value 'year': 1.0

Solution: Check https://github.com/wi2trier/cbrkit/blob/e5f749c1b52eee8b2568ebf5f0bcefbd9e58dfb7/cbrkit/sim/numbers.py#L28 because the distance between two values can lie outside of the min-max range. In previous case, the distance is always less that 1950

mirkolenz commented 1 month ago

Thank you for the bug report! I am currently on vacation and unable to check the function. However, we modeled the function after the one available in ProCAKE (https://procake.pages.gitlab.rlp.net/procake-wiki/sim/numeric/#linear) and if I remember correctly the interval is computed w.r.t. the distance value.

It would help me a lot if you could check whether the function behaves identically to the one available in ProCAKE.

If that is the case and the results are surprising for your use case, it may make sense to add an additional function with a different behavior.

Thanks again and best wishes!

mirkolenz commented 4 weeks ago

I finally had the time to dig into this and found that the function cbrkit.sim.numbers.linear is working as I described: The min/max values represent the difference between two values, thus it is behaving as it should. However, I still think that adding another similarity function should be added to cbrkit that is more suitable for your use case. It would basically use the following logic:

def interval(start: float, end: float) -> SimPairFunc[Number, float]:
    def wrapped_func(x: Number, y: Number) -> float:
        if x < start or x > end or y < start or y > end:
            return 0.0

        return 1.0 - abs(x - y) / (end - start)

    return wrapped_func

I'm still unsure about the naming for this function. Do you think cbrkit.sim.numbers.interval makes sense or does another term better describe its logic? I'd like to have a clear distinction between the existing linear function and the new one.

gjimenezUCM commented 4 weeks ago

Thanks for your time, Mirko. Exactly, the cbrkit.sim.numbers.linear is working as described in ProCAKE. Maybe the documentation in CBRKit should include the plot in ProCAKE and it could be enhanced indicating that the function works on a distance interval. I suggest to modify the parameter documentation sith something like "Minimum/maximum distance between values"

The proposed interval function is exactly what I was expecting for linear function. interval name makes sense for me... but the same name as employed in other CBR frameworks like jColibri for functions that work like linear. I think that the name is important, but it can be clarified with a detailed documentation.

mirkolenz commented 3 weeks ago

I just published a new release of CBRkit which has the function I outlined earlier: cbrkit.sim.numbers.linear_interval. I tried to make a clear distinction between both linear functions in the docstrings. Please let me know if it works as intended 😄

https://github.com/wi2trier/cbrkit/releases/tag/v0.13.0