pradosoft / prado

Prado - Component Framework for PHP
Other
186 stars 72 forks source link

TPropertyValue::ensureArray needs better "(...)" handling without eval #852

Open belisoful opened 1 year ago

belisoful commented 1 year ago

When TPropertyValue was written, it was PHP4/5. Things have changed. Standard PHP arrays are not "array()" but "[]".

So I propose that TPropertyValue::ensureArray be tweaked for the [newer] versions (7 & 8) of PHP format of array "[...]"

    public static function ensureArray($value): array
    {
        if (is_string($value)) {
            $value = trim($value);
            $len = strlen($value);
            if ($len >= 2 && $value[0] == '(' && $value[$len - 1] == ')') {
                return eval('return array' . $value . ';');
            } if ($len >= 2 && $value[0] == '[' && $value[$len - 1] == ']') {
                return eval('return ' . $value . ';');
            } else {
                return $len > 0 ? [$value] : [];
            }
        } else {
            return (array) $value;
        }
    }

Also, because Arrays are created by eval(), Using this function to create an array from user input could be a security risk. While that is not a typical use case, it is possible.

To reduce the security footprint of this function, I propose checking the input string for "Prado::" and not performing the eval if it's there. This is a data scrubbing function, not an "execute system routines" function.

An array string of "(Prado::getApplication()->getModule-('aModule')->flushDatabase())" might do something bad, so limiting the array eval to not access "Prado::" may be wise. For security purposes, I propose the following:

    public static function ensureArray($value): array
    {
        if (is_string($value)) {
            $value = trim($value);
            $len = strlen($value);
            if ($len >= 2 && $value[0] == '(' && $value[$len - 1] == ')' && stripos($value, "Prado::") === false) {
                return eval('return array' . $value . ';');
            } if ($len >= 2 && $value[0] == '[' && $value[$len - 1] == ']' && stripos($value, "Prado::") === false) {
                return eval('return ' . $value . ';');
            } else {
                return $len > 0 ? [$value] : [];
            }
        } else {
            return (array) $value;
        }
    }

Unit tests are needed, phpdoc would need updated as well. TPropertyValue has no unit tests. I have them written. But not for this, not yet.

ctrlaltca commented 1 year ago

TPropertyValue::ensureArray() was originally meant to parse simple lists of items in templates. Using it to parse data received from users is a guaranteed remote code execution. Any kind of workaround applied to eval() can be easily defeated by obscuring the cose, eg. if $value contains "new object()", "unserialize(whatever)", etc... Replacing eval() with something else is really a nice improvement. The first alternative that i can think of is str_getcsv(). It won't handle complicate cases (eg. nested arrays or keyed arrays), but i guess we don't want to handle those anyway. Right now the function is used only in a few places inside Prado, but unfortunately some of them seems to possibly need complicate cases, eg. TMemcache::setOptions(). I'd drop happily support for these cases if needed.

belisoful commented 1 year ago

We do need some kind of nested array processing. "[[1, 2, 1], [2, 4, 2], [1, 2, 1]]" because I have an image convolution filter for a new publishing image processor behavior. The new TAsset class (and publishing) is bonkers awesome. I'm doing the test cases for it now. The new File publishing system is going to be a new key core feature. We can add our own watermarks and copyright to images on publish. ensureArray is used by the image convolution filter. That's why this method is getting some attention and love.

(Also, I'm working on file meta data... eg. scrubbing image metadata, adding our own meta data (like copyright and updated publish times!), and selecting which images to manipulate on publish based on meta data.)

I agree that ensureArray for user input is an edge case. I don't like the eval. Yes, exactly, there are 1001 ways to compromise ensureArray.

Also, we have to put quotes in the array to get it to work properly with strings. eg "('item1', 'item2', 'item3')" as opposed to "(item1, item2, item3, my "text" 'in' a long string, 'my "text",\' in a string', "my 'text',\"in\" a string", (1, 2 , 3))".

My thought is that originally, when PRADO and TPropertyValue was created, PHP allowed constants as strings and so it was "OK" to do "(item1, item2, item3)". This, of course, is an error today.

belisoful commented 1 year ago

I am having success with this Regular Expression in validating an array:

\((?P<inner>(?P<comma>(?P>rv),)*(?P<rv>\s*(?P<value>(?P<recur>(?R))|(?P<const>"(?:[^"]|(?<=\\)")*?"|'(?:[^']|(?<=\\)')*?'|[^'"\(\)\s][^\(\),]*?))?\s*))\)

php: $regex = '/\((?P<inner>(?P<comma>(?P>rv),)*(?P<rv>\s*(?P<value>(?P<recur>(?R))|(?P<const>"(?:[^"]|(?<=\\\\)")*?"|\'(?:[^\']|(?<=\\\\)\')*?\'|[^\'"\(\)\s][^\(\),]*?))?\s*))\)/m';

it properly recurses the bracket, eg. '((((),()),()),())'. It handles numbers and strings (without quotes then the commas become terminators), strings with single or double quotes.

String that start with ' or " must terminate reciprocally. But strings with commas as terminators (aka, do not start with ' or ") may have the ' or " as the second or beyond characters without having to be balanced (in any way).

strings with commas as terminators cannot use the ( or ) characters for being the array syntax. To have a ( or ) in a string, single or double quotes around the string must be used. Without this, infinite loops happen.

"()", "(2)", "( 2 )", "(2, 3)", "(abc, xyz)", "(abc, 'efg', "xyz")" ('abc,"\'()', "xyz,'\"()")

errors: (11, string with()out quotes, xyz) ('abc' 'xyz')

parsing and validating is the next step, recursive parsing makes sense. "true" to true, integers to int, numbers to float. in typing comma separated strings.

will this work? https://regex101.com/r/wJY9Wn/1

belisoful commented 1 year ago

I'd like to add that 0x00ada123 should also be supported by the array processor. One of the image filters is an array of pixel colors that would be expressed as "(0x00FFFFFF, 0x7F000000, 0x3f808080)"

belisoful commented 1 year ago

The convolution filter has an array within array structure, so eg "((1, 2 , 1), (2, 4, 2), (1, 2, 1))". Which is not supported currently. it must be this to currently work: "([1,2,1], [2,4,2], [1,2,1])"