Add Scientific notation ("e" notation) support for unit parsing

berdanA commented 6 days ago

What is the underlying problem you're trying to solve?

In the units.parser (or specifically in the extractNumAndUnit function in topdown/parse_bytes.go), scientific notation is not currently supported for parsing numeric values with units. This limitation impacts functionality when using OPA Gatekeeper as the policy engine in Kubernetes clusters, where certain resource specifications permit the use of scientific notation. To maintain flexibility and improve user experience, I would like to propose a solution that adds support for scientific notation without altering existing functionality.

Describe the ideal solution

Enhance the extractNumAndUnit function to detect and handle scientific notation by checking for the presence of "e" or "E" followed by a numeric sequence. This allows values in scientific notation (e.g., "1e6") to be parsed as valid numbers. The proposed code modification below demonstrates this approach.

This change would not change or inhibit existing functionality as far as I see.

// extractNumAndUnit separates the numeric part from the unit part of a string.
func extractNumAndUnit(s string) (string, string) {
    isNum := func(r rune) bool {
        return unicode.IsDigit(r) || r == '.' || r == 'e'
    }

    firstNonNumIdx := -1
    for idx, r := range s {
        // Detect if we encounter a character that is not a number
        if !isNum(r) {
            firstNonNumIdx = idx
            break
        }
    }

    // If the whole string is a number, return it with no unit
    if firstNonNumIdx == -1 { // only digits and '.'
        return s, ""
    }

    if firstNonNumIdx == 0 { // starts with non-digit
        return "", s
    }

    // Now handle the scientific notation
    if firstNonNumIdx > 0 && (s[firstNonNumIdx] == 'E' || s[firstNonNumIdx] == 'e') {
        // Check if 'E' or 'e' is preceded by a number
        // Include 'E' or 'e' in the number if it is followed by digits
        if firstNonNumIdx < len(s)-1 && unicode.IsDigit(rune(s[firstNonNumIdx+1])) {
            return s[:firstNonNumIdx+2], s[firstNonNumIdx+2:] // include the exponent in the number
        }
    }

    // Regular case: return the number and the rest
    return s[:firstNonNumIdx], s[firstNonNumIdx:]
}

anderseknert commented 6 days ago

Sounds like a reasonable addition to me 👍

ashutosh-narkar commented 6 days ago

Feel free to contribute @berdanA if you'd like.

berdanA commented 5 days ago

I will publish a pull request hopefully shortly, I want to test this thoroughly

berdanA commented 5 days ago

Alright, I opened a pull request with all changes and tests. I hope I did everything correctly.

https://github.com/open-policy-agent/opa/pull/7147

open-policy-agent / opa

Add Scientific notation ("e" notation) support for unit parsing #7142

What is the underlying problem you're trying to solve?

Describe the ideal solution