lightbend / kalix-javascript-sdk

JavaScript and TypeScript SDKs for Kalix
https://docs.kalix.io/javascript/index.html
Apache License 2.0
22 stars 21 forks source link

Partial download of codegen #394

Open johanandren opened 2 years ago

johanandren commented 2 years ago

Created shopping cart quickstart and tried to build it

On Linux x86_64:

/srv/homes/johan/code/lightbend/shopping-cart/node_modules/@kalix-io/kalix-scripts/bin/kalix-codegen-js.bin --typescript --proto-source-dir ./proto --source-dir ./src --generated-source-dir ./lib/generated --test-source-dir ./test --integration-test-source-dir ./integration-test
Segmentation fault

On darvin arm64:

/Users/johan/Code/Lightbend/Kalix/shopping-cart-quickstart/node_modules/@kalix-io/kalix-scripts/bin/kalix-codegen-js.bin --typescript --proto-source-dir ./proto --source-dir ./src --generated-source-dir ./lib/generated --test-source-dir ./test --integration-test-source-dir ./integration-test
fish: Job 1, '/Users/johan/Code/Lightbend/Kal…' terminated by signal SIGKILL (Forced quit)
johanandren commented 2 years ago

None of the usual tools on Linux (gdb, ldd) seems to recognize the kalix-codegen-js.bin as a executable although file says:

file /srv/homes/johan/code/lightbend/shopping-cart/node_modules/@kalix-io/kalix-scripts/bin/kalix-codegen-js.bin
/srv/homes/johan/code/lightbend/shopping-cart/node_modules/@kalix-io/kalix-scripts/bin/kalix-codegen-js.bin: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, missing section headers at 19418976

readelf, maybe these errors are a hint about what's wrong?:

$ readelf -h /srv/homes/johan/code/lightbend/shopping-cart/node_modules/@kalix-io/kalix-scripts/bin/kalix-codegen-js.bin
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0xbb200
  Start of program headers:          64 (bytes into file)
  Start of section headers:          19416672 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         11
  Size of section headers:           64 (bytes)
  Number of section headers:         37
  Section header string table index: 36
readelf: Error: Reading 2368 bytes extends past end of file for section headers
readelf: Error: the dynamic segment offset + size exceeds the size of the file
johanandren commented 2 years ago

Hmm, tried to download the binary manually and getting a lot of disconnects from the Lightbend repo, maybe node doesn't handle that well and that's why the file is incomplete: https://repo.lightbend.com/raw/kalix/versions/1.0.0/kalix-codegen-js-x86_64-apple-darwin

Downloading with wget keeps retrying until it has the whole file and that then can be called wihout segfaults.

The successfully/manually downloaded one is 17mb, looking in node_modules that binary is just 6.3mb

johanandren commented 2 years ago

It seems node-fetch doesn't report when connection closed before delivering all bytes but instead tells us it is "OK"

johanandren commented 2 years ago

Didn't figure out a way to detect this, body is a stream so would have to be comparing response.headers.get('content-length') with the number of bytes piped through when done writing or something.

IT team is looking into why repo downloads are partial/closing connection though, so maybe that will sort this out.

pvlugter commented 2 years ago

That's unpleasant. Seems to fail the downloads quite often.

Have also had a play around with detecting. Couldn't get any errors from the stream until trying it on Node 16, where it will signal this error event on the response body:

Error: aborted
    at connResetException (node:internal/errors:692:14)
    at TLSSocket.socketCloseListener (node:_http_client:414:19)
    at TLSSocket.emit (node:events:539:35)
    at node:net:709:12
    at TCP.done (node:_tls_wrap:582:7) {
  code: 'ECONNRESET'
}

Following the changes there, it seems that it should have an aborted event on Node 14. But don't see that emitted for the body stream with node-fetch. Trying out axios in place of node-fetch and can get the aborted signal (seems it's a different type of stream as well, IncomingMessage instead of PassThrough).

Can also use stream.pipeline to have error handling attached automatically, and then it has this error:

Error [ERR_STREAM_PREMATURE_CLOSE]: Premature close
    at new NodeError (internal/errors.js:322:7)
    at IncomingMessage.onclose (internal/streams/end-of-stream.js:117:38)
    at IncomingMessage.emit (events.js:400:28)
    at TLSSocket.socketCloseListener (_http_client.js:432:11)
    at TLSSocket.emit (events.js:412:35)
    at net.js:686:12
    at TCP.done (_tls_wrap.js:564:7) {
  code: 'ERR_STREAM_PREMATURE_CLOSE'
}

I'll push up a draft that captures this at least. We could add retries on top.

Depending on what the underlying issue is, repo proxy or cloudsmith, we could also look at having these downloads somewhere else, like downloads.lightbend.com (S3).