Closed zachallaun closed 1 week ago
Thanks! Do you have anything on the terminal logs? :)
Unfortunately I don't. Nothing is logged after the "[Livebook] Application running at..." message. Is there a flag/env var I can use to increase verbosity?
@zachallaun please give it another try, it should work now :)
@jonatanklosko Yep, looks to be working! Thanks!
Hmm, so the deploy succeeded but the app has been stuck as "preparing" for ~20 minutes now.
Possibly related, I'm seeing this back in my Fly logs:
2024-06-25T17:11:42Z app[2874902a1ee428] iad [info][Livebook] Application running at http://localhost:8080/
2024-06-25T17:14:38Z app[2874902a1ee428] iad [info]17:14:38.445 [error] GenServer {Livebook.HubsRegistry, "team-zachallaun"} terminating
2024-06-25T17:14:38Z app[2874902a1ee428] iad [info]** (FunctionClauseError) no function clause matching in Livebook.Hubs.Broadcasts.hub_connection_failed/2
2024-06-25T17:14:38Z app[2874902a1ee428] iad [info] (livebook 0.13.0) lib/livebook/hubs/broadcasts.ex:102: Livebook.Hubs.Broadcasts.hub_connection_failed("team-zachallaun", %Mint.TransportError{reason: :closed})
2024-06-25T17:14:38Z app[2874902a1ee428] iad [info] (livebook 0.13.0) lib/livebook/hubs/team_client.ex:259: Livebook.Hubs.TeamClient.handle_info/2
2024-06-25T17:14:38Z app[2874902a1ee428] iad [info] (stdlib 6.0) gen_server.erl:2173: :gen_server.try_handle_info/3
2024-06-25T17:14:38Z app[2874902a1ee428] iad [info] (stdlib 6.0) gen_server.erl:2261: :gen_server.handle_msg/6
2024-06-25T17:14:38Z app[2874902a1ee428] iad [info] (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
2024-06-25T17:14:38Z app[2874902a1ee428] iad [info]Last message: {:connection_error, %Mint.TransportError{reason: :closed}}
Note: Navigating to https://MY_APP.fly.dev/apps/partsbase-csv
issues a 302 redirect to https://MY_APP.fly.dev
.
@zachallaun if it's stuck at preparing it's most likely that Mix.install/2
OOMed. There is currently a bug where the OOM makes the deployment process stuck forever on the app server instance. You can bump memory and restart the server. I will work on a fix.
@jonatanklosko Thanks for the suggestion! I was previously running on 1gb
but scaled to 4gb
to be sure; unfortunately, it seems to still be stalled. I can't do anything in that deployed Livebook, like open a new notebook, but I think that's because it's an app server in read-only mode...?
@zachallaun oh that's weird, I could reproduce the stalled deployment, but the rest of the Livebook should be operational, like starting a new session. When opening a new notebook, are you getting an error, timeout, or something else? Anything curious in the Fly logs for that?
@jonatanklosko Getting a lot of these in the logs:
2024-06-25T18:40:48Z app[2874902a1ee428] iad [info] WARN Reaped child process with pid: 610 and signal: SIGUSR1, core dumped? false
My Fly.io metrics page shows memory usage averaging around ~300MB, but I know that that doesn't always capture memory spikes that lead to OOM issues.
If you deactivate the app on Livebook Teams and restart the machine, does it become operational?
It seems like it doesn't.
I'm going to try recreating the app completely and see if I can reproduce. This was an existing Livebook deployment (0.12.1) that I upgraded to 0.13.0 and set the various secrets for in order for it to be an app server. Perhaps there are some leftover gremlins in the bits that are causing mischief 😈
Okay, so I'm not sure what the issue was, but creating and connecting a completely new app server deployment seemed to work (and using shared-cpu-1x
and 1gb
). I'll compare the various deployment configs and will share if I figure out what was causing the issue.
Okay, so the fly.livebook.toml
that I was using included the following env vars:
[env]
ELIXIR_ERL_OPTIONS = '-proto_dist inet6_tcp'
LIVEBOOK_DATA_PATH = '/data'
LIVEBOOK_HOME = '/data'
LIVEBOOK_IP = '::'
LIVEBOOK_ROOT_PATH = '/data'
PORT = '8080'
Deleting the entire [env]
block and re-deploying seems to fix things, so I suppose at least one of those vars changed with 0.13.0 in a way that Livebook didn't like!
All seems to be well now.
The issue was ELIXIR_ERL_OPTIONS = '-proto_dist inet6_tcp'
. Do you know what could possibly be setting that?
Yeah it is, it is ELIXIR_ERL_OPTIONS
. This specific env var is no longer passed to the runtime, so with that configuration Livebook would start with proto dist ipv6, while runtimes would start with proto dist ipv4, so connecting to the runtime would always timeout. This also explains why "New notebook" would be stuck, because the session tries to start the runtime upfront, but in this case it blocks until it times out.
Got it.
It was set that way based on these docs on Fly. Perhaps y'all can coordinate with some folks there to get those updated before 0.13 is widely announced. (And maybe worth making a note of it in the 0.13 CHANGELOG?)
PR already sent to Fly!
@zachallaun Docs on Fly are updated now. Thanks!
Environment
livebook server
(escript installation)git rev-parse HEAD
if running with mix): 0.13.0Current behavior
Giving Livebook Teams deployment a try, but having trouble deploying an app to a running app server. Here are the steps I've followed:
https://MY_APP.fly.dev
and basic auth ZTAExpected behavior
The app should be deployed to the app server. 🙂