Open tyler92 opened 1 week ago
Yeah, well - it's either that or we silently accept garbage, as we used to. If you have an improvement proposal, send a PR
bool DateTimeFormat::isValid(const std::string& dateTime)
{
static const RegularExpression regs[] = {
RegularExpression(DateTimeFormat::ISO8601_REGEX),
RegularExpression(DateTimeFormat::RFC822_REGEX),
RegularExpression(DateTimeFormat::RFC1123_REGEX),
RegularExpression(DateTimeFormat::HTTP_REGEX),
RegularExpression(DateTimeFormat::RFC850_REGEX),
RegularExpression(DateTimeFormat::RFC1036_REGEX),
RegularExpression(DateTimeFormat::ASCTIME_REGEX),
RegularExpression(DateTimeFormat::SORTABLE_REGEX)
};
for (const auto& f : regs)
{
if (f.match(dateTime)) return true;
}
return false;
}
Creating the RegularExpression
s once in isValid
takes it from over 12s to .16s for me.
Why are all regexes checked, and not just one? E.g. for the following code
const auto format = Poco::DateTimeFormat::ISO8601_FRAC_FORMAT;
auto parsed = Poco::DateTimeParser::parse(format, text, tzd);
I would expect that only ISO8601_REGEX
will be checked.
And why is user input not checked against these regexes if DateTimeFormat::hasFormat(fmt)
is false?
About handling garbage input. Let's consider the following code:
Points for discussion:
SKIP_JUNK
accepts garbage inputPARSE_NUMBER_N
doesn't report an error if the input doesn't contain a number at allI hope the parser body can be improved so that it is more strict and regular expressions can be avoided.
@andrewauclair can you look at this issue and propose improvements?
Consider the following code:
Execution time (Release configuration):
With the format
Poco::DateTimeFormat::ISO8601_FRAC_FORMAT + " "
(with an additional space) execution time becomes better for 1.13.3.I guess the reason is the following commit: https://github.com/pocoproject/poco/commit/4f1cf683079cc3b00be96259fc42f5c291ab4c77 (https://github.com/pocoproject/poco/pull/4330).
Up to 8 regex expressions are compiled on each call
https://github.com/pocoproject/poco/blob/4f1cf683079cc3b00be96259fc42f5c291ab4c77/Foundation/src/DateTimeFormat.cpp#L153-L156
if the provided format is known:
https://github.com/pocoproject/poco/blob/4f1cf683079cc3b00be96259fc42f5c291ab4c77/Foundation/src/DateTimeParser.cpp#L47