leethomason / tinyxml2

TinyXML2 is a simple, small, efficient, C++ XML parser that can be easily integrated into other programs.
zlib License
5.07k stars 1.83k forks source link

Row/column of current element / error? #26

Closed MortenMacFly closed 7 years ago

MortenMacFly commented 12 years ago

In TinyXML1 it was possible to query the row/column of the currrent element (e.g. in case of an error). This was very nice to show a descriptive error message so the user was able to pinpoint the error easily. Even it was possible to open an editor and jumping to that element.

I see that in TinyXML2 the Row() and Column() methods are gone. Will the return or is there an alternative / better way to obtain that information?

Note that the error must not be an XML error, but may come from a user-defined validation. So the element might be XML correct, but the content may be wrong for the application.

leethomason commented 12 years ago

The tricky bit is that TinyXML-2 doesn't parse the row/column on the initial parse pass. This is done for performance reasons. In the case of an error, the code could flush the parsing and figure out the current row/col, but it's a reasonably involved change. I haven't really investigated the value of row/col tracking vs. code complexity is a good tradeoff, but I obviously have concerns it isn't.

MortenMacFly commented 12 years ago

Thank you for the fast (!) answer.

I see your point, but I think its a feature I am really missing. In one of my projects I use this very intensively. The XML files processed by the tool based on TinyXML(2) are often with errors, e.g. that an enumeration is not fulfilled. I was able to pinpoint the exact location of that error easily, so users of the tool did know exactly where the loading went wrong. Now, all I can tell is that an element is wrong. But if that element occurs several hundred times in a XML file then this is of no help. ;-)

Maybe I am the only one, but seeing this features has gone is sad. :-(

leethomason commented 12 years ago

Good feedback; no promises on how/when it can be done, but knowing it makes a difference certainly ups the priority. I'll take a look and see how it might be done on the current system.

MortenMacFly commented 12 years ago

Sounds good to me... nice to see that you understand my use-case. Just take your time, I'll be patiently watching GIT. ;-)

And nevertheless: Keep up the great work, I am using this library since ages as is really fulfils all my needs and is... erm... just tiny. ;-)

avkumar commented 12 years ago

Hey Leethomason!! I would be glad to take up this bug ... Thanks in advance...

leethomason commented 12 years ago

avkumar - I'd be happy to see a pull request to address this, certainly! Just be careful to not cause the strings to be parsed/null terminated until an error occurs.

MortenMacFly commented 12 years ago

I stumbled across this again - @avkumar: Did you tried already (did you actually pull?!)?

reFX-Mike commented 12 years ago

@MortenMacFly: why don't you stick to tinyXML-1 then?

MortenMacFly commented 12 years ago

Well I do. Here, I wanted to point out that IMHO this is an important feature that has gone. Nothing else.

leethomason commented 12 years ago

It's a valid request, certainly, just tricky to implement. I also think there are a lot of advantages of using TinyXML-2 over -1.

reFX-Mike commented 12 years ago

If you want to add this, could you make sure to make it optional? Like a pre-processer #define to avoid unnecessary code for people who don't need it? Makes the compiled code faster and shorter.

I would even like something like that to leave out the printXML functionality, because in our case we only need to read and parse XML, not be able to modify or write it. Should I open another issue for that?

leethomason commented 12 years ago

I appreciate the desire to minimize the code, but optional features are intentionally avoided. Each #if through the code doubles the testing space. (Although OS/platform #if states are unavoidable.) That quickly becomes a problem: bugs can creep into one path but not the other, and everyone who shares/discusses the code has to clarify which path they are using. Bug inconsistency on Windows vs. Android vs. *Nix has already bit me a few times on TinyXML-2, which only has the platform #if, and the feature #if is an issue that plagues TinyXML-1.

reFX-Mike commented 12 years ago

Fair enough. Let's hope it doesn't add too much bloat...

DavidEGrayson commented 9 years ago

I think that error reporting is a very important thing, and unfortunately this library is getting it wrong. There might be an easy solution though:

TinyXML2's XMLDocument class currently has GetErrorStr1 and GetErrorStr2, which always seem to return either NULL or pointers to a location into its internal _charBuffer. It should be easy to add methods like GetErrorOffset1 and GetErrorOffset2 that return the offset of each of those strings inside the char buffer. In C++, subtracting two pointers to the same array gives you an offset.

The offset isn't as useful as a row and column number, but good text editors do provide ways to seek to an offset in a file. Additionally, TinyXML2 could provide a method that takes a offset and gives you the row and column number.

In fact, for any part of the XML document, TinyXML2 should provide ways to get a const char * pointer to it, along with a byte offset, a character offset, and row and column numbers. This would help the user to create their own error messages for higher-level errors that are not detected by TinyXML2 (e.g. elements being out of order).

If GetErrorStr1 and GetErrorStr2 are always guaranteed to be NULL or point to a place inside the char buffer, then adding this feature causes no performance penalty. If they sometimes point elsewhere, there would be a small performance penalty because you would need to add a bool that records whether they are pointing into the _charBuffer or not; I believe if they are not pointing into the char buffer, the C++ standard won't guarantee the results from pointer arithmetic on them.

kezenator commented 7 years ago

I've added line number (row) support in a fork: https://github.com/kezenator/tinyxml2

Errors report a line number, and every parsed node and attribute also has the line number stored. Also added additional tests.

Also created pull request #503

kezenator commented 7 years ago

The approach I tool was to pass a current line number through the ParseDeep and associated functions. SkipWhitespace and ParseText have been updated to increment the line when the see a '\n'. Also, Identify has to roll-back the line number when it detects text content starting with whitespace.

The dream.xml parse test changed from about 1.2ms to about 1.4ms on my PC (i7-6770HQ + VisualStudio 2015 Community Update 3)

MortenMacFly commented 7 years ago

If I may quote leethomason:

The way TinyXML-2 does parsing makes this substantially more difficult to implement;

I'd say with this minimal overhead I personally see more advantages than disadvantages. Hopefully leethomason agrees and merges this into the main repo. Than you very much, kezenator, very nice work...

leethomason commented 7 years ago

Was fixed some time ago; failed to close issue.