xoreos / xoreos-tools

Tools to help the development of xoreos
https://xoreos.org/
GNU General Public License v3.0
66 stars 28 forks source link

GFF2XML -> XML2GFF (invalid base64 length) #67

Closed drake127 closed 3 years ago

drake127 commented 3 years ago

Hi,

it looks like xml2gff doesn't work for XMLs that contains longer base64 as it fails because of "Invalid length for a base64-encoded string".

I checked the string itself and it's correct, just contains whitespaces from XML formatting (and newlines).

DrMcCoy commented 3 years ago

Can you give me an example file to check against?

drake127 commented 3 years ago

Sure, here you are. Only thing I did differently I used --encoding 0=cp-1250 but I don't think it should matter in this case.

It fails in countLength(Ustring) with string (please note newline and leading spaces):


      RHJvYm7saprtIHphaGFsZW7hIHBvc3RhdmEgdiBwcm9zdP1jaCBsZXNu7WNoIJph
      dGVjaCBzIG1l6GVtIHUgcGFzdS4KCg==

(url) serialized.zip

DrMcCoy commented 3 years ago

Thanks, I'll have a look at it later in the evening

drake127 commented 3 years ago

FYI - I modified decodeBase64 to strip whitespace characters and it is working now.

Just one unrelated question though, the resulting GFF is 5 KB less than the original. When I convert it back to XML, the result is the same. Is it expected? The header looks entirely different: image

DrMcCoy commented 3 years ago

Ah, yes, found the issue. Should be fixed with c6315f75a8a366beb898718a098aa29cc22f70ac, thanks for reporting! :)

Yeah, the file being shorter is okay. Our GFF code tries harder to consolidate the same string data (*). For example, the original files contains the string "STAMINA_MAX" multiple times, while the GFF produced by xml2gff contains the string just once, and all fields using that same string just reference this one instance.

That also explains the differences in the header, because those values there are offsets to the different sections in the GFF, one of them being the string table and another the external field data table. We're creating files that are logically identical, i.e. that contain the exact same information, not files that are byte-by-byte identical.

(*) Technically, it just throws extended field data value into a map and duplicates get consolidated that way.