Loading a VTP file using three.js loaders

dvenkatsagar commented 8 years ago

Hey there,

I have a VTP file that is generated using vtk. It is binary encoded and I am trying to find a way to load the data in that file using three.js.

Is there support for the various vtk file formats in the three.js VTK loader??

With regards Sagar DV

WestLangley commented 8 years ago

Related #7909

dvenkatsagar commented 8 years ago

I have seen that, but the files he uses are (i think) the legacy vtk format. Im looking if there is support for the new XML file formats of vtk (like vti,vtp,vtu etc).

mrdoob commented 8 years ago

Seems like we don't support the new formats yet.

dvenkatsagar commented 8 years ago

What I understood was that, the new formats are basically xml files, and I think with a little modification of the VTKLoader, it might be possible to get the data out of them. I will try and check for a way to do it, but if anyone is already on the task, then thank you for your hard work :)

dvenkatsagar commented 8 years ago

Ok, I made some progress regarding the VTP support. I was able to modify the VTKLoader.js a little and was able to extract the POINTS, STRIPS, CELLDATA,.... etc. The xml was in this format :

<VTKFile type="PolyData" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
  <PolyData>
    <Piece NumberOfPoints="505013" NumberOfVerts="0" NumberOfLines="0" NumberOfStrips="0" NumberOfPolys="1009574">
      <PointData></PointData>
      <CellData Scalars="Scalars_">
        <DataArray type="Float32" Name="Scalars_" format="binary || ascii" RangeMin="1" RangeMax="1">
          // Can be ascii or binary. Has the entire set of data as an array 
        </DataArray>
      </CellData>
      <Points>
        <DataArray type="Float32" Name="Points" NumberOfComponents="3" format="binary || ascii" RangeMin="287.58520477" RangeMax="541.01039731">
         </DataArray>
        </Points>
      <Verts>
        <DataArray type="Int64" Name="connectivity" format="binary" RangeMin="1e+299" RangeMax="-1e+299">AAAAAACAAAAAAAAA</DataArray>
        <DataArray type="Int64" Name="offsets" format="binary" RangeMin="1e+299" RangeMax="-1e+299">AAAAAACAAAAAAAAA</DataArray>
      </Verts>
      <Lines>
        <DataArray type="Int64" Name="connectivity" format="binary" RangeMin="1e+299" RangeMax="-1e+299">AAAAAACAAAAAAAAA</DataArray>
        <DataArray type="Int64" Name="offsets" format="binary" RangeMin="1e+299" RangeMax="-1e+299">AAAAAACAAAAAAAAA</DataArray>
      </Lines>
      <Strips>
        <DataArray type="Int64" Name="connectivity" format="binary" RangeMin="1e+299" RangeMax="-1e+299">AAAAAACAAAAAAAAA</DataArray>
        <DataArray type="Int64" Name="offsets" format="binary" RangeMin="1e+299" RangeMax="-1e+299">AAAAAACAAAAAAAAA</DataArray>
      </Strips>
      <Polys>
        <DataArray type="Int64" Name="connectivity" format="binary" RangeMin="0" RangeMax="505012">
        </DataArray>
        <DataArray type="Int64" Name="offsets" format="binary" RangeMin="3" RangeMax="3028722">
        </DataArray>
      </Polys>
    </Piece>
  </PolyData>
</VTKFile>

Now I am stuck at converting the binary encode data into correct format (The ascii form of the same code was not a problem). The binary data is in this format :

<DataArray type="Float32" Name="Scalars_" format="binary || ascii" RangeMin="1" RangeMax="1">
  fAAAAACAAACYHgAAOgAAADoAAAA6AAAAOgAAADoAAAA6AAAAOgAAADoAAAA6AAAAOgAAADoAAAA6AAAAOgAAADoAAAA6AAAAOgA .........
</DataArray>

If anyone can help me out with converting this, then I can proceed on to creating the geometry. Ill keep a pull request after figuring this out. :)

Another thing, according the vtk docs, it says that the binary data written is of base64 encoding but when I try to decode the string, I get an error stating that its not a base64 string. And also I think the data is compressed with zlib, so I would also need to figure out how to decompress it first.

MasterJames commented 8 years ago

Does this link help? If not please elaborate.

http://www.earthmodels.org/software/vtk-and-paraview/vtk-file-formats

There is a demo file. Your sample does not look binary maybe more like octet in email?

dvenkatsagar commented 8 years ago

Here is the link to the file that I am using, link. (10MB, a entire skull generated from a set of dicom images...). If you'd like, you can experiment with it. I did look at the link, and I think it might help in solving the problem.

MasterJames commented 8 years ago

Right base64 it must be. This appears somewhat helpful. https://mathema.tician.de/what-they-dont-tell-you-about-vtk-xml-binary-formats/ So atob() for decode maybe? There's a lot on that here's something more to help. http://stackoverflow.com/questions/246801/how-can-you-encode-a-string-to-base64-in-javascript/247261#247261

dvenkatsagar commented 8 years ago

Yes it does thank you. So, to my understanding, lets say we take a sample data like this :

AQAAAACAAABsAAAAIQAAAA==eJw7e+bMvrNImIGhYf9ZNDEgtgOK2+Oj0TE2cQBdUkBa

then the blob header would be :

AQAAAACAAABsAAAAIQAAAA==

The compression header + compressed data would be the rest??

MasterJames commented 8 years ago

More or less. Where did this file come from? I would make a super simple smallest example maybe a 16x16 checkerboard texture or whatever it is and then you can see exactly what's what.

MasterJames commented 8 years ago

Okay I found it http://www.vtk.org/ https://gitlab.kitware.com/vtk/vtk docs http://www.vtk.org/documentation/ Looking closer you may not get a header as the XML tells you what it is. You may have to alloc some memory or type cast it or massage something else. Here is a Base64 online encode decoder https://www.base64decode.org/

byte_order="LittleEndian" Base64 is probably already implied so the decode would sort that out by default. header_type="UInt32" Since I'm not seeing a length the decoder probably spits it out so just assign the output of atob() to the correct type and it should match your none binary small simple test example file.

compressor="vtkZLibDataCompressor"

Right so Compressor is here http://www.vtk.org/doc/nightly/html/classvtkZLibDataCompressor.html

This function would uncompress it but I'm not seeing JS so.... size_t vtkZLibDataCompressor::UncompressBuffer

Try getting atob to spit out the simple example and compare original looking for headers, littleEndian sign issues etc.

Maybe you'll get a series of zeros and ones?

Okay so Scalars just floats JS should like that. No wait JS is 64 bit. type="Float32" Name="Scalars_" (right okay) format="binary" (0 + 1s? in groups of 32 with one for sign) RangeMin="1" RangeMax="1" I guess the range means? nothing important here.

Oh boy... https://blog.mozilla.org/javascript/2013/11/07/efficient-float32-arithmetic-in-javascript/ This suggest to me you need to roll-your-own, or bable-up a Math.fround our wait for it/ES6!? No there is a polyfill there. https://blog.mozilla.org/javascript/2013/11/07/efficient-float32-arithmetic-in-javascript/#polyfill

The bottom of this should help too. http://www.vtk.org/Wiki/VTK_XML_Formats

I also found a js zlib decompression tool here... https://github.com/nodeca/pako from http://stackoverflow.com/questions/4507316/zlib-decompression-client-side

Anyway it would be nice to have just a simple box.vtp to work with initially.

dvenkatsagar commented 8 years ago

@MasterJames The file is from the same link that you have given, the compressed binary vtu file.

I also went through the same links as you, and came to one conclusion, that is how would you define the size of the compression header.

Here is the links to a simple cube file. binary no compression : link plain ascii : link Binary compressed : link

MasterJames commented 8 years ago

Okay great that helps understand when it's working etc. I've noted in their source like the name suggests it's just a zlib call to uncompress() http://fossies.org/linux/VTK/IO/Core/vtkZLibDataCompressor.cxx

I guess thia lib js file has a simple way to convert? https://github.com/imaya/zlib.js var deflate = new Zlib.Deflate(plain); I suspect that would be on the bit between the double-equals ==eJxjYGjYzwAHRLHtSWMTbe6gtAMA+HsX6Q==

I'm reviewing this for something about the AQAAAA... type header stuff? http://www.cacr.caltech.edu/~slombey/asci/vtk/vtk_formats.simple.html

It says headers are teminated with a newline which suggest it's been omitted and the other part is part of the data to be compressed (one way to find out ~ Deflate). This zlib only spews 32767 zero's?!? from eJxjYGjYzwAHRLHtSWMTbe6gtAMA+HsX6Q

Somewhere it suggest there's a current version 3.0 and these are 0.1 is that something to worry about on the side of your generator(?) for the simple cube samples?

The min and Max ranges in TCoords seems wrong or normalized or something? should be -1.5 to 1.5 ! no? Name="TCoords" RangeMin="0.70710678119" RangeMax="2.1213203436" same with Points? 1.732?

Is there a way to get Version 3 output I suspect with these confusing Min Maxs somethings not right.

dvenkatsagar commented 8 years ago

@MasterJames Not sure why though, but the the example is simple as this(I actually used the nodejs port of vtk):

var vtk = require('vtk');

var cube = new vtk.vtkCubeSource();
cube.SetBounds(-1,1,-1,1,-1,1);
var writer = new vtk.vtkXMLPolyDataWriter();
writer.SetInputData(cube.GetOutput));
writer.SetCompressorTypeToNone();
//writer.SetDataModeToAscii();
writer.SetDataModeToBinary();
writer.SetFileName("test_no_compression.vtp");
writer.Write();

Im using vtk6.3.0 and When I write it out, it gives me the vtp version as 0.1. I think that is the default to the newer xml versions.

And thank you for taking the time to help me here.

Update : I even checked with the python version(though there might be a bug with the nodejs port). But it gives me the same output. The ranges are not -1.5 to 1.5 for some reason.

MasterJames commented 8 years ago

Oh okay, so you've used this to create the files? https://www.npmjs.com/package/vtk I get ERR! OMG There is no Visual C++ compiler installed. Install Visual C++ Build Toolset or Visual Studio. when trying to npm install vtk I'm likely not following through with it's needs, sorry. Still it's not the end of my attempts to help.

MasterJames commented 8 years ago

This explains the version 0.1 part I suppose. http://www.vtk.org/Wiki/VTK_XML_Formats

dvenkatsagar commented 8 years ago

@MasterJames You would need cmake and also visual studio too. I will check this out later, for now, Ill try to complete the ascii versions first. :)

MasterJames commented 8 years ago

I was (easily) mistake by the Deflate thinking it's uncompressed. The prefixes of negation de=un is wrong because the noun's are opposite or whatever. I now think Inflate should unpack it so I put into node js rel...

var dcmp = new zlib.Inflate("AQAAAACAAAAgAQAAGQAAAA==eJxjYGjYzwAHRLHtSWMTbe6gtAMA+HsX6Q==") Error: unsupported compression method [The Deflate increased size and returned zero's so I'm not sure that it makes any sense to me. and with only the double-quoted part as well with the same result. So I thought maybe it is not the right tool? or a buggy js version maybe?

It's this page again http://www.vtk.org/Wiki/VTK_XML_Formats that in theory tells us all we need to know at the bottom about how it is compressed. Defined in the file as header_type, (along with it's LittleEndian byte order) I believe is the BlockSize for reading the first value which is number of blocks, and then I guess you'd confirm or change the blocksize, which is the second value, but really it's in the c-size-i (i is a number) where the real data blocks would be. From what I've gathered. My best guess is the AQAAAA stuff is that and after the first double-equals is the data part. Still it seems that's provided in the XML itself (if not they should fall to defaults), and noted again is the ranges still make no sense to the simple sample data stored inside. I think a character is going to be 8 bits = 1 byte so first 4 represent the number of blocks? anyway...it says that it's each of those blocks that are then zlib compressed.

MasterJames commented 8 years ago

If A is 65 that's half of 128 so if it's signed, those are zeros. LittleEndian says AQAA is actually AAAQ so it 000 and the Q is the offset from A=65 and Q = 81 so 81 - 65 = 16 = F= 10000(binary) Anyway I guess that means there are 16 blocks? That doesn't sound right does it? And AACA is 0x200 which is 512 so that sounds like a good Block Size to me. Must be 16 blocks if I did that the same way. The little g is 103 - 65 = 38 = 0x26 -> 0x2600 so it ends up being 9728 for p-size? Now it makes no sense again? there are however 34 plus the 4 for double-equals yes 38 characters encoded. I'll keep trying to understand the format, check how these other header numbers check out etc.

dvenkatsagar commented 8 years ago

@MasterJames The thing is, you do not inflate the entire string, as the data is only zlib compressed and then base64 encoded. Then the compression header is added to it which is also base64encoded. So maybe :

zlib.inflate("part of eJxjYGjYzwAHRLHtSWMTbe6gtAMA+HsX6Q");

will give you the data (as the string above might represent the compression header and data) and base64decode of "AQAAAACAAAAgAQAAGQAAAA" might give you the blob header. I still need to check this though

MasterJames commented 8 years ago

Right if you look at the no compression it's all LittleEndian after 3 decades and Murphy's Law I of course have it somewhat backwards. I assummed ABCD = CDAB but it's actually DCBA What I was doing initially is called "Mid-Little Endian" (CDAB).

MasterJames commented 8 years ago

Just noticing zlib is now included in node so no 3rd party solution needed, https://nodejs.org/api/zlib.html#zlib_examples it shows to do something like this...

var buf = new Buffer("eJxjYGjYzwAHRLHtSWMTbe6gtAMA+HsX6Q", "base64");
zlib.unzip(buf, function(err, buffer) {if(!err){console.log(buffer.toString());}else{console.log(err);}});

The Inflate function returns more then a buffer but it doesn't complain about headers being wrong etc.

Anyway it's still not clear to me how it was encoded and what to do to undo it? Maybe it's not base64 maybe I'm close. If we had a string encoded it might make things clearer. The numbers make it extra confusing. "\u0000\u0000??..." maybe the question-marks are zeros and/or ones and it's working? or it's /u for unsigned but they have negatives so ?? come out in my node rel readout?!

I'm sorry I'm obviously struggling with this. I hope some of this is going to help you. I don't think I should or will have much more to add without getting the correct data out on this end first. Stick with ASCii is my advice from the beginning. Compressing like this never saves enough time or space in my opinion. Modern computing's higher transmission rates offset the costs of compressing and decompressing and simply don't add up to a significant savings in most cases. Especially with all the endian isms and the likes, it's torture. Still once you figure it out I'm sure it will no longer seem as convoluted.

dvenkatsagar commented 8 years ago

True, ascii would have no problem, but the size difference ascii and binary is like 5 times, (the difference in the format sizes of the skull model was 10-50 MB), so Im just trying to convert it somehow.

Thank you for your help though. Well for now, Ill try to create the geometry with the ascii format and for that I need to figure out how to generate a single buffer from the different Data Arrays like "offset" and "connectivity". :)

MasterJames commented 8 years ago

Yes it can even be 1000:1 but it's not your best delivery (client transfer) format. Also the http protocol can send gzip or in my case with something like Meteor BSON and other things make all that more transparent as it should be.

More useful format description PDF http://www.vtk.org/wp-content/uploads/2015/04/file-formats.pdf page 15 The data are encoded in base64 and listed contiguously inside the DataArray element. Data may also be compressed before encoding in base64

MasterJames commented 8 years ago

Okay I took the last part 'offsets'. In ascii it's "4 8 12 16 20 24" using the no_compression version Edit: It should be Int64Array not Float32Array but there's only Int32Array which gives the same result because they are small numbers I guess.

> var buf = new Buffer("MAAAAAQAAAAAAAAACAAAAAAAAAAMAAAAAAAAABAAAAAAAAAAFAAAAAAAAAAYAAAAAAAAAA==", "base64");
undefined
> buf
<Buffer 30 00 00 00 04 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 0c 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00
18 00 00 00 00 00 ... >
> var f32a = new Float32Array(buf)
undefined
> f32a
Float32Array {
  '0': 48,
  '1': 0,
  '2': 0,
  '3': 0,
  '4': 4,
  '5': 0,
  '6': 0,
  '7': 0,
  '8': 0,
  '9': 0,
  '10': 0,
  '11': 0,
  '12': 8,
  '13': 0,
  '14': 0,
  '15': 0,
  '16': 0,
  '17': 0,
  '18': 0,
  '19': 0,
  '20': 12,
  '21': 0,
  '22': 0,
  '23': 0,
  '24': 0,
  '25': 0,
  '26': 0,
  '27': 0,
  '28': 16,
  '29': 0,
  '30': 0,
  '31': 0,
  '32': 0,
  '33': 0,
  '34': 0,
  '35': 0,
  '36': 20,
  '37': 0,
  '38': 0,
  '39': 0,
  '40': 0,
  '41': 0,
  '42': 0,
  '43': 0,
  '44': 24,
  '45': 0,
  '46': 0,
  '47': 0,
  '48': 0,
  '49': 0,
  '50': 0,
  '51': 0 }
>

so it looks like thew first one is the size minus the double-equals marker adds up. There seems to be a lot of extra zeros. How that encodes with larger numbers isn't clear. It seems they go to 65535 = FFFF but are shown as decimal. 32 / 8 = 4 so you get 32 bits in 8 decimals. At this point it could still be partly bigendian the first one has 00040000 which is 00000004 split in two parts and swapped bigendian.

edit: I'd wagger if there was Int64Array it would not be bigendian anymore and half the size. https://gist.github.com/lttlrck/4129238#file-int64-js

[Alas this doesn't handle the compression part]

MasterJames commented 8 years ago

This looks right for the compressed part.

> var zbuf = new Buffer("eJxjYYAADijNA6UFoLQIlJaA0gAHMABV", "base64");
undefined
> zbuf
<Buffer 78 9c 63 61 80 00 0e 28 cd 03 a5 05 a0 b4 08 94 96 80 d2 00 07 30 00 55>
> zlib.unzip(zbuf, function(err, buffer) {if(!err){console.log(zbuf2 = buffer.toString());}else{console.log(err);}});
undefined
> ♦            ♀       ►       ¶       ↑

> zbuf2
'\u0004\u0000\u0000\u0000\u0000\u0000\u0000\u0000\b\u0000\u0000\u0000\u0000\u0000\u0000\u0000\f\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0010\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0014\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0018\u0000\u0000\u0000\u0000\u0000\u0000\u0000'
>

so it's true BiggEndian order (backwards) 40000000 is 00000004 then b, f, 10, 14, 18

oddly B and F are not 8 and 12 in hex? but the others check out. There must be a glitch still in using these particular testing tools. Maybe zlib.unzip options?

dvenkatsagar commented 8 years ago

@MasterJames Well thats a big leap, you almost solved the problem (You should be the one who deserves the credit for all of this, how about adding your magic to the code that I'm coming up with??). Now only if we can do the same process with out using the node.js API, and i think for that, we can convert the string taken, to an array buffer and from there, we can use a API known TextDecoder, that might help us out in decoding it. (check this out link), but this is experimental though.

dvenkatsagar commented 8 years ago

@MasterJames

Finally I getting something, check this out,

I have altered a library (base64-js) in npm and was able to get the exact values using pako,

Here is the code that I used :

// Taken from Base64-js
var Base64toByteArray = function(b64) {
  var Arr = typeof Uint8Array !== 'undefined' ? Uint8Array : Array;
  var i;
  var lookup = [];
  var revLookup = [];
  var code = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
  var len = code.length;

  for (i = 0; i < len; i++) {
    lookup[i] = code[i];
  }

  for (i = 0; i < len; ++i) {
    revLookup[code.charCodeAt(i)] = i;
  }
  revLookup['-'.charCodeAt(0)] = 62;
  revLookup['_'.charCodeAt(0)] = 63;

  var j, l, tmp, placeHolders, arr;
  var len = b64.length;

  if (len % 4 > 0) {
    throw new Error('Invalid string. Length must be a multiple of 4');
  }

  placeHolders = b64[len - 2] === '=' ? 2 : b64[len - 1] === '=' ? 1 : 0;
  arr = new Arr(len * 3 / 4 - placeHolders);
  l = placeHolders > 0 ? len - 4 : len;

  var L = 0;
  for (i = 0, j = 0; i < l; i += 4, j += 3) {
    tmp = (revLookup[b64.charCodeAt(i)] << 18) | (revLookup[b64.charCodeAt(i + 1)] << 12) | (revLookup[b64.charCodeAt(i + 2)] << 6) | revLookup[b64.charCodeAt(i + 3)];
    arr[L++] = (tmp & 0xFF0000) >> 16;
    arr[L++] = (tmp & 0xFF00) >> 8;
    arr[L++] = tmp & 0xFF;
  }

  if (placeHolders === 2) {
    tmp = (revLookup[b64.charCodeAt(i)] << 2) | (revLookup[b64.charCodeAt(i + 1)] >> 4);
    arr[L++] = tmp & 0xFF;
  } else if (placeHolders === 1) {
    tmp = (revLookup[b64.charCodeAt(i)] << 10) | (revLookup[b64.charCodeAt(i + 1)] << 4) | (revLookup[b64.charCodeAt(i + 2)] >> 2);
    arr[L++] = (tmp >> 8) & 0xFF;
    arr[L++] = tmp & 0xFF;
  }
  return arr.buffer;
}

// Separate the blob and content
var txt = 'AQAAAACAAAAwAAAAGAAAAA==eJxjYYAADijNA6UFoLQIlJaA0gAHMABV';
var blob_header = '';
var content = '';
var inblob = true;
for(var i = 0,len = txt.length; i < len; i++){
  if(txt[i-1] === '='){
    if(txt[i-2] === '='){
      inblob = false;
    }
  }
  if(inblob){
    blob_header += txt[i];
  }else{
    content += txt[i];
  }
}

// Get the respective buffers
//var blob_buffer = Base64toByteArray(blob_header);
var content_buffer = Base64toByteArray(content);

// Uncompress it
var uncompressed_content = pako.inflate(content_buffer)

// Create the Respective Typed Arrays
if(ele.attributes.type == 'Float32'){
  //console.log(new Float32Array(blob_buffer));
  console.log(new Float32Array(uncompressed_content)); // doesnt give the correct values
}else if (ele.attributes.type == 'Int64'){
  //console.log(new Int32Array(blob_buffer));
  console.log(new Int32Array(uncompressed_content)); // gives the correct values
}

But some DataArrays are not working correctly .... need check to check that out

MasterJames commented 8 years ago

I see you latest comment and progress which is great. Wrote the following before seeing that. Yes well the first section still comes up question marks "u0000 ??", so the next step is to try to integrate/impliment what's known into the browser like you say and hopefully it will work without a much confusion like no ?? Marks and the number type vs endianisms confusion. What seemed like a simple hint that it looked like base64 has turned into my involvement somehow!? See how it goes getting setup with what we now know (+convert tool you suggest) and then let me know your PRs location and hopefully I can make a difference without taking up to much time, if you haven't sorted it out yet. I think if you do ascii first and then the base64 and then add zlib compression it will be an easier approach to adjust to fit some test files like cube, and other progressive vtp files, as I'm sure you already are.

dvenkatsagar commented 8 years ago

@MasterJames Yup, will fork project and get to that part.

But now the only problem is that, it works fine for any DataArray which has type Int64. It doesnt work well if the type is Float32.

MasterJames commented 8 years ago

Right because Javascript has only float64 so that has to be addressed. I hoped your tool would handle that but it could be modified. Let's hope easily.

dvenkatsagar commented 8 years ago

@MasterJames Ok I have forked the project, here is the link to the modified file link. (Please check the parseXML function)

Please check the xml_vtk branch of my repository link

dvenkatsagar commented 8 years ago

@MasterJames For now, Im able to parse the content correctly for now, thank you for your help, kindly check the code given in the link in my previous comment. Now Ill will concentrate on how to generate the geometry.

mrdoob / three.js

Loading a VTP file using three.js loaders #8199