Closed raquo closed 1 year ago
@raquo I had done something similar a long time ago and it looked like this:
import scala.scalajs.js.typedarray.*
val result: Future[Foo] = dom.Fetch.fetch("/api/foo.json").toFuture
.flatMap { (response: dom.Response) =>
response.arrayBuffer().toFuture
}
.map { (buf: ArrayBuffer) =>
val byteBuffer = TypedArrayBuffer.wrap(buf)
val bytes = new Array[Byte](byteBuffer.remaining())
byteBuffer.get(bytes)
Json.decode(bytes).to[Foo].value
}
I think even this should work:
val result: Future[Foo] = dom.Fetch.fetch("/api/foo.json").toFuture
.flatMap { (response: dom.Response) =>
response.arrayBuffer().toFuture
}
.map { (buf: ArrayBuffer) =>
Json.decode(TypedArrayBuffer.wrap(buf)).to[Foo].value
}
Thanks! I actually had an older version of scala-js checked out locally and didn't notice this new TypedArrayBuffer helper!
I looked into how byteBuffer.get(bytes)
is implemented, and it seems to be copying each element in a loop similar to my while loop, but with more abstraction. I haven't benchmarked anything, but I'll assume that the raw JS while loop is a bit faster, or at least, not slower.
TypedArrayBuffer.wrap(buf)
looks promising, but somehow it does not work, Borer fails to parse the byte array. This:
case class Foo(id: String)
given Codec[Foo] = deriveCodec
dom.Fetch.fetch("/foo.json").toFuture
.flatMap { (response: dom.Response) =>
response.arrayBuffer().toFuture
}
.map { (buffer: js.typedarray.ArrayBuffer) =>
val tab = js.typedarray.TypedArrayBuffer.wrap(buffer)
try {
Json.decode(tab).to[Foo].value
} catch {
case err: Throwable => println(err)
}
}
When given foo.json
with content {"id": "foo"}
(+ newline before EOF), this prints out:
io.bullet.borer.Borer$Error$InvalidInputData: Expected ',' or '}' but got 'o' (input position 9)
I tried to debug this but not sure what's wrong, if I manually ask the typed array buffer for its contents (tab.get(ix)
), it gives me the expected bytes: 123 34 105 100 34 58 32 34 102 111 111 34 125 10.
I'll stick to manual copying in a loop for now, I guess.
When given foo.json with content {"id": "foo"} (+ newline before EOF), this prints out:
io.bullet.borer.Borer$Error$InvalidInputData: Expected ',' or '}' but got 'o' (input position 9)
I tried to debug this but not sure what's wrong, if I manually ask the typed array buffer for its contents (tab.get(ix)), it >gives me the expected bytes: 123 34 105 100 34 58 32 34 102 111 111 34 125 10.
Hmm... this is strange. Can you post the debugging output that you get when you enable printLogging as such?:
Json.decode(tab).withPrintLogging().to[Foo].value
Oh, debug mode, nice! That's what it prints when the TypedArrayBuffer version fails:
1| *{
1| "of"
1| -> ""
And if I use my while
logic, which does work, it gives me this:
1| *{
1| "id"
1| -> "foo"
1| }
2| END
Seeing a hint of reversal ("of"
), I thought on a whim that maybe a palindrome json ({"id":"di"}
) would work, but no, that fails too.
Ok, that looks like an encoding and/or byte order mismatch. I haven't really done much with scala.js myself so far, so I'm afraid I won't really be of much help but these untyped number arrays the JS platform gives us sure are soo nice! They make things so much easier! No ugly typing issues to think about. Everything just magically works. NOT! </rant>
Well, I've reduced the reproduction down to:
import io.bullet.borer.*
import io.bullet.borer.derivation.MapBasedCodecs.*
import org.scalajs.dom
import scala.scalajs.js
import scala.scalajs.js.JSConverters.*
case class Foo(id: String)
given Codec[Foo] = deriveCodec
println("---V1")
val str = """{"id": "foo"}"""
val strByteArray = str.getBytes
val byteBuffer = ByteBuffer.wrap(strByteArray)
Json.decode(byteBuffer).withPrintLogging().to[Foo].value // Works
println("---V2")
val x = strByteArray.toJSArray
println(">>" + x) // all good
val int8Array = new js.typedarray.Int8Array(x)
dom.console.log(int8Array) // all good
val buffer = js.typedarray.TypedArrayBuffer.wrap(int8Array.buffer)
{
// Optional block for debugging. Same result with or without it
var s = ""
while (buffer.hasRemaining) {
val b = buffer.get()
s += b.toChar
}
println("> manually iterated: " + s)
buffer.rewind()
}
Json.decode(buffer).withPrintLogging().to[Foo].value // Fails
V1, which works, uses a HeapByteBuffer, whereas V2, which fails, uses TypedArrayByteBuffer, both are java subclasses of ByteArray that are re-implemented in Scala.js without looking at the Java source code (see here).
If there is nothing wrong in Borer, I suspect there could be a bug in Scala.js implementation of TypedArrayByteBuffer. I can't see a difference in the ByteArray-s that they produce just using the .get
method, so it must be some other methods.
Is there perhaps a simpler Borer call that I could give the Scala.js folks for debugging instead of the macro-based Json.decode(buffer).withPrintLogging().to[Foo].value
, to help them see which methods on ByteArray are being called by Borer?
Ok, just looked into this. It's indeed a ByteOrder problem. If you change the final line to this:
buffer.order(ByteOrder.BIG_ENDIAN)
Json.decode(buffer).withPrintLogging().to[Foo].value // Fails
things start working.
borer's FromByteBufferInput
requires the ByteBuffer
to have BIG_ENDIAN byte order but neither sets nor verifies this, which isn't good.
My current feeling that it'd be best to simply set the byte order to BIG_ENDIAN before starting to parse. If it was LITTLE_ENDIAN before then borer will have changed it but it's still possible to simply reset to LITTLE_ENDIAN manually afterwards if one really should want to reuse the buffer.
One could also question the current state that js.typedarray.TypedArrayBuffer.wrap
creates a buffer with native byte order while java.nio.ByteBuffer.wrap
always creates a BIG_ENDIAN buffer.
Maybe that's worth a ticket on the scala.js side.
Ah, I see, thanks for getting to the bottom of this!
I really don't know what the endianness should be, to make a case either way. For all the classes and methods that are available on the JVM, Scala.js needs to match their behaviour for compatibility, and it does so for java.nio.ByteBuffer.wrap, but TypedArrayBuffer is their JS-only construct that is not available on JVM, so it uses the native byte order for some reason. It's definitely intentional as there are comments about it. Perhaps they need or want to match JS semantics for that one, for some reason.
I don't feel too good about borer modifying the input's byte order if it will not be reliably setting it back to its original value at the end. Seems like modifying the input's byte order could potentially cause similar problems with other libraries if they make different assumptions about endian-ness. Personally I would rather borer check for endianness, and throw a exception describing the problem if the order is wrong, and for this order requirement to be documented. This would be an improvement over the status quo without the need for bigger changes.
On my end I'll just set the order manually for now.
I just released version 1.12.0 containing the verification logic that the consumed ByteBuffer
indeed has BIG ENDIAN byte order configured. If not it will throw an IllegalArgumentException
explaining the problem and possible resolutions.
Thanks for reporting this issue!
Nice, thanks!
Hello, thank you for such a nice library! The docs are great too.
I am trying to use Borer in a Scala.js project, specifically, to parse JSON API responses that were encoded with Borer on the backend. Best I can figure out is this:
However, this is not ideal: it encodes the raw response bytes into a UTF-16 Javascript string (
.text()
), then decodes that into an array of UTF-8 bytes (str.getBytes(StandardCharsets.UTF_8)
), which is what Borer expects (...right?). I'm actually not sure that this is strictly correct in terms of encodings, but it seems to work at first glance.This seems inefficient, because the dom.Response class already provides a js.typedarray.ArrayBuffer, and I can build a js.typedarray.UInt8Array from it, which is the Javascript equivalent to
scala.Array[Byte]
.But, while Borer can decode
scala.Array[Byte]
thanks toByteAccess.ForByteArray
implicit instance, there is no similar instance ofByteAccess[js.typedarray.UInt8Array]
. I don't feel solid enough about all this stuff to implement it myself, and I'm not sure if it is even likely to provide a performance benefit? I see that I would need to implementtoByteArray
method, which would need to convertjs.typedarray.UInt8Array
toscala.Array[Byte]
, and well, if I had such a conversion available, I could just call it on thejs.typedarray.UInt8Array
myself, and use theByteAccess.ForByteArray
implicit instance, right? Like so:So I guess the issue is about adding some built-in way to use JS types, or at least documenting how to do that, because that's quite a bit more boilerplate than is needed for Scala and Java types.