Q: What steps will reproduce the problem?
Try emitting any scalars ending in colon ':'. For example:
// Test A
YAML::Emitter emitter;
emitter << "a:";
std::cout << emitter.c_str();
// Test B
YAML::Emitter emitter;
emitter << YAML::BeginMap
<< YAML::Key << "apple"
<< YAML::Value << ":"
<< YAML::Key << "banana"
<< YAML::Value << ":"
<< YAML::EndMap;
std::cout << emitter.c_str();
Q: What is the expected output? What do you see instead?
You should expect the scalar in Test A and the values in Test B to be quoted:
--- "a:"
---
apple: ":"
banana: ":"
Instead, the colons don't get quoted:
--- a:
---
apple: :
banana: :
which then cause the Parser to treat the colon values as a map key-value
delimiters and throw YAML::ParserException().
Q: What version of the product are you using? On what operating system?
0.2.5 on Debian Lenny
Q: Please provide any additional information below.
We debugged this and came up with this root cause & solution. Please let us
know if we made the correct solution:
In IsValidPlainScalar() in emitterutils.cpp, the disallowed regex contains the
regex Exp::EndScalar() which is supposed to match any strings ending in ":" and
then return false (so that the string gets quoted).
bool IsValidPlainScalar(const std::string& str, bool inFlow, bool allowOnlyAscii) {
...
// then check until something is disallowed
const RegEx& disallowed = (inFlow ? Exp::EndScalarInFlow() : Exp::EndScalar())
|| (Exp::BlankOrBreak() + Exp::Comment())
|| Exp::NotPrintable()
|| Exp::Utf8_ByteOrderMark()
|| Exp::Break()
|| Exp::Tab();
StringCharSource buffer(str.c_str(), str.size());
while(buffer) {
if(disallowed.Matches(buffer))
return false;
if(allowOnlyAscii && (0x7F < static_cast<unsigned char>(buffer[0])))
return false;
++buffer;
}
return true;
}
Specifically, EndScalar() is constructed to match any string ending in
":<space>" or just ":":
inline const RegEx& EndScalar() {
static const RegEx e = RegEx(':') + (BlankOrBreak() || RegEx());
return e;
}
The problem is that the left side of the RegEx would match the colon correctly
but the right side, which is an REGEX_OR container, would return false in
RegEx::IsValidSource() function, due to source being at the end of the string:
template<>
inline bool RegEx::IsValidSource<StringCharSource>(const StringCharSource&source) const
{
return source || m_op == REGEX_EMPTY;
}
In other words, after the emitter reads the colon, it would fail due to the
engine thinking that the input is no longer valid, even though it is "valid"
enough to match the "empty" (aka end of string) regex .
The fix for this is to change IsValidSource() to the following:
template<>
inline bool RegEx::IsValidSource<StringCharSource>(const StringCharSource&source) const
{
switch(m_op) {
case REGEX_MATCH:
case REGEX_RANGE:
return source;
default:
return true;
}
}
This makes sure that the source (input) is always valid for operator regex
(OR/AND/NOT/SEQ), and that a source is only invalid if it's at eos and the
regex is trying to match an actual character.
Original issue reported on code.google.com by atoms...@gmail.com on 2 Dec 2010 at 10:55
Original issue reported on code.google.com by
atoms...@gmail.com
on 2 Dec 2010 at 10:55