Closed dreasgrech closed 2 years ago
Thanks for the report.
I have been aware of this issue for some time. It is currently not implemented correctly. I am reluctant to solve this problem correctly.
In order to resolve DTD placed by URI on external servers, etc., it needs to communicate with them using HttpClient or similar. I am providing this library as a simple xml parser and do not want to add extra library dependencies if at all possible.
It is possible to fix the error by skipping the DTD definition, but it does not solve the underlying problem. (Especially if the DTD contains definitions of ENTITY, i.e., character aliases, the xml will not be interpreted correctly.)
So is there currently a way to programmatically skip the DTD definition when parsing and specifying it manually instead, or at least determine if it's there and skip it and not throw a FormatException
?
Maybe some form of "header pre-parsing" before actually digging down to the real parsing to determine that there is an external DTD?
Because it would be a real shame to have it just fail and throw an exception whilst parsing a valid XML file.
Because it would be a real shame to have it just fail and throw an exception whilst parsing a valid XML file.
Yes, I think so, too. The library should provide a way to skip it so that the exception is not thrown in some way.
Currently, FormatException
is always thrown when an external DTD is present.
This would require modifying the library.
Can you suggest a temporary workaround modification I can make to the XmlParser.cs
file to somehow skip parsing the DOCTYPE altogether please? Maybe a change to the method TryParseDocType()
?
Something like SkipUntil((byte)'>', data, ref i);
if the xml file is using an external DTD.
It would be fantastic if this can somehow be done because it would still make this library usable for me.
I wrote this temporary workaround which should be enough until an official fix is released.
This is my new code that reads the xml file:
bool tryExternalDocType = false;
XmlObject xml = null;
try
{
XmlParser.containsExternalDoctype = false;
xml = XmlParser.ParseFile(filePath);
}
catch (FormatException)
{
tryExternalDocType = true;
}
if (tryExternalDocType)
{
XmlParser.containsExternalDoctype = true;
xml = XmlParser.ParseFile(filePath);
}
Added a public field in XmlParser.cs
and using it in the TryParseDocType
method.
public static bool containsExternalDocType;
private static bool TryParseDocType(RawString data, ref int i, bool hasNode, OptionalNodeList optional, ref RawStringTable entities)
{
.
.
.
if(SkipEmpty(data, ref i) == false) { throw NewFormatException(); }
/** New code: *********/
if (containsExternalDocType)
{
SkipUntil((byte)'>', data, ref i);
return true;
}
/***********************/
var nameStart = i;
.
.
.
}
That code is not good and cannot be added to the library as is, even temporarily.
However, I will provide some alternative implementation. I'm working hard on it now.
It was not intended to be added to the library but as a temporary workaround for my needs as I can now handle both files with inline doctypes but also skip the external doctypes, without having to change the entire library. It currently seems to be producing the results I was looking for.
Looking forward to the alternative implementation.
I am currently trying to parse this xml file:
A
FormatException
is being thrown when parsing the DOCTYPE element.Tested on U8XmlParser v1.5.0