Checking the output size

kripken commented 9 years ago

I'm not entirely clear on the purpose of CHECKED_OUTPUT_SIZE. It looks like in one mode, we unpack the entire thing to see what the output size is (in calculate_unpacked_size); in the other, we use an unpacked size that we store in the binary.

The second mode is more efficient, but troublesome because unpacking is not deterministic, due to doubles (#7). To avoid snprintf issues we now use the browser's double printing, which is different than at least my OS libc's printing, but even without that, the libc in emscripten (musl) might print differently than the glibc that my OS uses to do the packing, etc.

If that is correct, it seems like the options are

Don't check the output size strictly. Give an estimate, just enough to allocate a buffer known to be big enough. zlib does this, for example.
Bundle a full double printer in the unpacker, like dtoa.c.

BrendanEich commented 9 years ago

ND is the enemy; this isn't the last time double sprinting ND will bite, I bet (based on decades of bitter experience). I know dtoa.c is ugly old code, and I haven't kept up with its maintenance log or many forks, but it looks to me like the winning option.

/be

lukewagner commented 9 years ago

Is the text output of JavaScript's ToString(number) deterministic?

BrendanEich commented 9 years ago

@lukewagner Yes! See http://www.ecma-international.org/ecma-262/6.0/#sec-tostring-applied-to-the-number-type.

/be

P.S. Recall we had Guy Steele on Ecma TC39 TG1 in 1996-7 courtesy Sun, and he and David M. Gay co-authored the dtoa.c paper.

lukewagner commented 9 years ago

Cool. Could we then build our double printing in terms of FFI calls to use JS ToString(number) (and then apply my hack on top to insert a '.' if there isn't one)?

kripken commented 9 years ago

That is what we do in web builds now - are you suggesting that we run the packer in JS as well, instead of natively (relying on deterministic ToNumber)? Sounds like a reasonable third option alongside the 2 mentioned before. A downside is it means no native packing/unpacking, and a little slower packing.

A fourth option is I suppose to run the packer on NULL, finding out the output size first, then doing the unpacking (as in the other CHECKED_OUTPUT_SIZE option). This would increase unpacking time, but would still allow native packing/unpacking.

kripken commented 9 years ago

It seems like optimizing the web unpacker is the most important thing, so option 3 as I think @lukewagner proposed sounds best. I implemented that. Ran into #10 along the way, but other than that a basic test now works. Packing speed clearly suffers, though.

lukewagner commented 9 years ago

Oh, I forgot that we need to run in both packer and unpacker. Requiring the packer to run as JS is a bit annoying. Since there aren't many double literals, I'm inclined back to what you suggested earlier of just over-allocating at the start. We already over-allocate and copy a Blob at the end (b/c the asm.js heap needs to contain the input, the output, and temporary heap usage), so this isn't a big deal.

kripken commented 9 years ago

The over-allocation would need to know how many doubles are in the code, though, because the worst-case of the size change is quite large. For example, off the top of my head, 1.1e20 vs 110000000000000000000..

I'm inclined to just do the packing in JS. This seems to be slow only because emscripten runs the packer in node by default, but in spidermonkey it should be quite snappy.

lukewagner commented 9 years ago

I'm just a bit reticent to introduce a dependency on JS. At the moment, it's been nice that you can run pack/unpack-asmjs as native or asm.js.

lukewagner commented 9 years ago

OTOH, it would help keep the unpacker small (compared to importing dtoa.c) so I guess this seems fine for now.

lukewagner / polyfill-prototype-1

Checking the output size #9