pyscience-projects / pyevtk

PyEVTK (Python Export VTK) exports data to binary VTK files for visualization/analysis with packages like Paraview, VisIt, and Mayavi.
Other
63 stars 15 forks source link

Invalid XML-files #19

Open MuellerSeb opened 4 years ago

MuellerSeb commented 4 years ago

pyevtk stores the data associated with the given mesh in append data, that is given in a raw binary format. Setting of raw: https://github.com/pyscience-projects/pyevtk/blob/488306a93015c3c82688e0961f485972e71644ad/pyevtk/vtk.py#L636 Encoding as binary: https://github.com/pyscience-projects/pyevtk/blob/488306a93015c3c82688e0961f485972e71644ad/pyevtk/evtk.py#L98

This is producing invalid XML files and other software is may unwilling to read the files (e.g. meshio see: https://github.com/nschloe/meshio/issues/786)

We should use base64 encoded data, where the binary data is encoded with ascii-characters as suggested by VTK. Some references:

When using appended data with base64, we have to keep in mind, that the offsets are changing, since they address the character in the base64 encoded string and not the binary offset like with raw data: https://github.com/pyscience-projects/pyevtk/blob/488306a93015c3c82688e0961f485972e71644ad/pyevtk/vtk.py#L514

meshio already implemented writing routines using base64 encoding. We should have a look there: https://github.com/nschloe/meshio/blob/d9c05ae688858b5166630874f3ec875a43b8fd37/meshio/vtu/_vtu.py#L482

renefritze commented 4 years ago

I've never even considered that the output files might not be valid XML since all my (only VTK standard following apparently) consumers happily accept them. I also didn't know that meshio doesn't fully support the VTK standard then, nor that Kitware managed to define a standard that uses xml and isn't fully compliant. Great stuff :roll_eyes:

So as far I understood, the only upside of using 'appended-raw' is that the implementation is less complex. Is that correct?

And aside from the implementation being trickier, there's no downside to using 'appended-binary', right?

MuellerSeb commented 4 years ago

You are right. In the end, it shouldn't be that hard to rewrite the appending routines. But one has to dig through all that serialization and base64 stuff...

MuellerSeb commented 4 years ago

My main aim would be to produce files, that meshio can read.

renefritze commented 4 years ago

I cannot really justify pushing time into helping with the implementation, but if you can and want to, I could invest a little time in making the test setup better. I've already pushed a quick & dirty check to the generated examples into the xml_checks branch.

MuellerSeb commented 4 years ago

Another idea: If we drop the use of appended and write the dataArray directly at their position of occurrence in the file, we don't have to estimate the offset in the appended data.

Only downside: the file isn't that readable anymore as before, where the data-metainfo was stored at the top of the file and the actual data was appended.

@renefritze , @xylar: would you have any problems with that approach?

xylar commented 4 years ago

I think having the file be readable has saved me some debugging trouble in the past but I don't have strong feelings about it.

Regarding base64 encoding, won't it by definition have to take up 4 times as much space, since you're only allowed to use 64 of the 256 values of a given byte? If so, I would be opposed to just switching to base64 as opposed to making that the default option but continuing to support the existing functionality even if it is not valid XML (since it works great in ParaVeiw).

renefritze commented 4 years ago

I agree with @xylar in that we should be careful to not degrade current use cases. Making the output mode configurable, defaulting to current mode, would be perfectly fine with me though.

While not space-efficient in itself, the base64 mode would also allow adding inline gzipping of the data, IIRC. That would be a real long-term benefit for us I think.