joomla / joomla-cms

Home of the Joomla! Content Management System
https://www.joomla.org
GNU General Public License v2.0
4.74k stars 3.64k forks source link

JInput::getFloat() (still) does not interpret ".x" values correctly #29779

Open JTBlum opened 4 years ago

JTBlum commented 4 years ago

Steps to reproduce the issue

$in = JApplicationCms::getInstance()->input; die($in->getFloat('testFloat',0.0));

Navigate to the page and add the $_GET parameter testFloat=.25

Expected result

Blank page except for the text ".25"

Actual result

Blank page except for the text "25"

System information (as much as possible)

Joomla! Version Joomla! 3.6.5 through Joomla! 3.9.19 Stable [ Amani ] 2-June-2020 15:00 GMT

Additional comments

This has been reported before and was incorrectly closed in pull request 15736. Please review the actual content of the regular expression used in the PHP documentation, specifically the following:

Formally as of PHP 7.4.0 (previously, underscores have not been allowed):

LNUM          [0-9]+(_[0-9]+)*
DNUM          ([0-9]*(_[0-9]+)*[\.]{LNUM}) | ({LNUM}[\.][0-9]*(_[0-9]+)*)
EXPONENT_DNUM (({LNUM} | {DNUM}) [eE][+-]? {LNUM})

The following quote was used from an earlier version of PHP, which also shows that the change is in line with native PHP, but was used as justification to close it:

Formally:

LNUM          [0-9]+
DNUM          ([0-9]*[\.]{LNUM}) | ({LNUM}[\.][0-9]*)
EXPONENT_DNUM [+-]?(({LNUM} | {DNUM}) [eE][+-]? {LNUM})

If you didn't catch it, the left side of the DNUM regex allows for a floating point number to start with a dot in both cases.

Quy commented 4 years ago

Please test PR #29780

SharkyKZ commented 4 years ago

Reopening as #29780 was closed.

JTBlum commented 4 years ago

Given the discussion on #29780, I'll go into more detail here and propose another change to the RegEx as well as some changes to the unit tests.

I could go on, but all of this hinges on one assumption that could be mistaken: I don't think that strings containing floating point values in the middle of junk data should come back as anything but junk, but the many of the tests in this suite and one other of five sets disagree.

I originally proposed the following RegEx because I was thinking in terms of things that were already parsed as tokens:

$pattern = '/[-+]?[0-9]*(\.[0-9]+)?([eE][-+]?[0-9]+)?/';
                       ^

Knowing that we need to take into account potential junk data surrounding legitimate values I would say the change is still a valid one, but that the details of the unit tests above might in some cases return different results since they would now show decimal values instead of integers in some cases. If, on the other hand, numbers surrounded by junk data are not supposed to be parsed at all I would propose the following:

$pattern = '/^[-+]?[0-9]*(\.[0-9]+)?([eE][-+]?[0-9]+)?$/';
             ^          ^                             ^

Thank you for your patience.

Fedik commented 4 years ago

what about explode the string by dot and check each part separately? (as int?)

$string = 'abc-12.456abc';
$f =  new Joomla\CMS\Filter\InputFilter;

$p = explode('.', $string , 2);
$r1 = $f->clean($p[0], 'int');
$r2 = $f->clean($p[1], 'int');
$r = floatval($r1 . '.' . $r2);

var_dump($r);

easy :smiley: except for $string = 'abc-.456abc'; :smile:

JTBlum commented 4 years ago

That doesn't take into account plus/minus signs, exponent notation, or integers that we want to interpret as floating-point values using the filter methods.