Closed Jojo-Schmitz closed 1 month ago
I wonder if a more general approach is needed, such as letting readInt
always call .simplified
. @miiizen what do you think?
You mean like this?
int String::toInt(bool* ok, int base) const
{
ByteArray ba = simplified().toUtf8();
return static_cast<int>(toInt_helper(ba.constChar(), ok, base));
}
I was actually thinking of something in XmlStreamReader::readInt
. Note that AsciiStringView::toInt
is used here, rather than String::toInt
.
Making the change directly at the String
/AsciiStringView
level may or may not be a good idea; I don't have a strong opinion about that.
Also, It may be better to use String::trimmed
instead of simplified
; trimmed
only removes leading and trailing whitespace, while simplified
simplifies all whitespace including mid-string. But if there was any mid-string whitespace, then toInt
would fail anyway, so no reason to try to simplify that mid-string whitespace first.
I'd prefer to use simplified()
or maybe trimmed()
in just those 2 locations in importmxmlpass2.cpp.
But I think we can't assume that those are the only places where we will ever encounter leading/trailing whitespace. The XML specification says that whitespace around numbers is allowed and should be ignored by the parser. More precisely,
whiteSpace
constraint: https://www.w3.org/TR/xmlschema-2/#rf-whiteSpacedecimal
type, stating that it has the whiteSpace
constraint applied: https://www.w3.org/TR/xmlschema-2/#decimaldecimal
type from the XML schema standard: https://www.w3.org/2021/06/musicxml40/musicxml-reference/data-types/xsd-decimal/semitones
type in MusicXML is based on decimal
: https://www.w3.org/2021/06/musicxml40/musicxml-reference/data-types/semitones/As per https://www.w3.org/2021/06/musicxml40/musicxml-reference/data-types/semitones/ that makes up for 5 occurences in in importmxmlpass2.cpp. Still pretty managable, to be done individually there.
There are many more toInt()
though, not listed there. I won't like to "disturb" those.
But of course it does not only apply to the semitones type, but to basically all cases where a number has to be parsed from an XML tag's content. Hence my suggestion to make the change in XmlStreamReader::readInt
.
It'd also be readText().toInt()
oops
Yes, we should probably do this in XmlStreamReader
- my only hesitation is that this code is more often used to read MuseScore files, which really shouldn't need this sanitisation. However, I doubt this will have a huge impact on performance
As that uses AsciiStringView
, this isn't really trivial
I struggled to find a more general solution without incurring more overheads than necessary in the xml reader. I'd still prefer we fix this in XmlStreamReader
, but I'm happy with this solution for the moment.
Rebased to fix alleged but non-existent merge conflicts
Follow up to #24915
Resolves: #24865 and https://musescore.org/en/node/368990#comment-1260723