nickbabcock / Pdoxcl2Sharp

A Paradox Interactive general file parser
MIT License
39 stars 13 forks source link

ReadString Does Not Properly Handle Partially-Quoted Strings. #27

Closed Measter closed 10 years ago

Measter commented 10 years ago

By "partially-quoted" I mean the following format: "Wahibre Ibiau"_Ibiau. This format appears to only be found in a list of names, and always contains an _. What the parser ends up doing is reading it as two strings: "Wahibre Ibiau", _Ibiau'.

An example of which is seen in this file

My proposed fix is to return the entire string with the quotes removed, as the underscore will always be there.

nickbabcock commented 10 years ago

Wow, that is bizarre! I've never seen that before. Does it have any significance in game (does it render differently?)

I don't think that the solution will be that drastic of a change. Did you want to implement it? Else, I'll have it done in a couple of days.

Also, it looks like I don't have a test for it (not your concern -- just a side remark), but the reason why ReadString reads quote to quote is because it is possible to name entities like the following:

name="Mark
  *&@#==={(})!#

hi"

and have it be completely valid.

Measter commented 10 years ago

In terms of significance, it's two parts of a name definition. A name in CK2 can contain a space, in which case it must be quoted. A name can also be part of a group of related culture-specific names, think English "Paul" and Portuguese "Paulo". The bit after the underscore is the group name.

For the solution, what I was thinking could work is when you come to the closing quote, peek the next character to see if it's a space or not. If it's not a space, just do another ReadString() and combine the two parts.

I could have a go at implementing it tomorrow; I don't think it should be too difficult.

nickbabcock commented 10 years ago

I think there are some subtleties (and it's frustrating when I find I don't have tests for them -- I'll have to add them)

But I believe something like the following is valid (I'll check when time permits):

player="VEN"date="1492.1.1"

So I think one approach would be to check for an underscore explicitly with ReadNext and if it exists, read until ReadNext IsSpace. If the underscore wasn't there and IsSpace(currentChar) is true -- don't do anything as space is going to be chucked anyways. This will be the 99% use case, so we make it fast. Else add currentChar to the peek queue.

Measter commented 10 years ago

I've fixed the issue in commit 7f7841c. If you're satisfied with the code and tests in this commit and the one in comment issue, I'll create a pull request.

nickbabcock commented 10 years ago

Yup, go ahead and create a pull request. I'll have it in the main codebase this evening. Thanks for your help!