imoscovitz / wittgenstein

Ruleset covering algorithms for transparent machine learning
MIT License
90 stars 24 forks source link

Problem with scientific notation #16

Open Arzik1987 opened 2 years ago

Arzik1987 commented 2 years ago

Once a small value in scientific notation defines a bin boundary, the discretization fails. One possible solution would be to replace

if split_idx is not None and not split_idx:
    split_idx = i
# Found a - after the split, and it's not the minus of a negative number
elif i > split_idx + 1:
    return None

in the function _str_to_floor_ceil with

if value[i-1] != 'e' and not split_idx:
    split_idx = i
david-chapela commented 2 years ago

Hi,

I ran into the same issue, when an interval has one of its members in scientific notation function _str_to_floor_ceil() returns None and it crashes. For example, for "2.17e-14-100".

I think @Arzik1987 solution will fail when the scientific notation is in the second number, as the second condition (not split_idx) would not match, entering in the return None clause.

It would be needed to skip the '-' when it is found after 'e'. So, line 190 need to skip also the time in which the previous char is an 'e':

if char == "-" and i != 0 and value[i-1] != 'e':
    if split_idx is not None and not split_idx:
        split_idx = i
    # Found a - after the split, and it's not the minus of a negative number
    elif i > split_idx + 1:
        return None
david-chapela commented 2 years ago

Hi,

I did a pull request with the solution to this problem, you can find there the fixed code.