Closed WIGGLES-dev closed 2 months ago
When I try to run this on a random example like todo_app_sqlite_axum
I start seeing errors like this
ERROR [hyper::server::tcp] accept error: Too many open files (os error 24)
but do not run into any panics. Although it looks from the text of your steps that maybe it requires an extractor too?
I'm open to the idea that there's some actual underlying issue here, but I would need an actual reproduction that is more specific than the steps you've given here.
I ran into the same issue with leptos_actix
.
Although I was not stress testing nor spamming refresh, I am running an instance of my webapp on a VPS with little resources.
I get the following error quite frequently on a page that runs a handful of server functions at once:
thread 'actix-server worker 1' panicked at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/leptos_actix-0.5.6/src/lib.rs:231:66:
called `Option::unwrap()` on a `None` value
I am completely unable to reproduce the error when running the webapp locally, so I can't give you a minimal example or clear steps to reproduce the bug.
@WIGGLES-dev @pnmadelaine Now that #2158 is merged, the server fn implementation on the main branch is now quite different from the previous implementation. It is available for testing either with a dependency on the main branch or version 0.6.0-beta
.
Because the underlying implementation has changed significantly, I'm going to close this issue. If you encounter the same problem with the new version, feel free to open a new issue.
this does still happen on 0.6.5:
7.636 app[e8247eef726d98] cdg [info] thread '<unnamed>' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/leptos_reactive-0.6.5/src/context.rs:316:9:
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] Location { file: "/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/leptos_axum-0.6.5/src/lib.rs", line: 286, col: 31 } expected context of type "leptos_axum::ResponseOptions" to be present
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] stack backtrace:
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] 0: rust_begin_unwind
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] at /rustc/595bc6f00369475047538fdae1ff8cea692ac385/library/std/src/panicking.rs:645:5
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] 1: core::panicking::panic_fmt
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] at /rustc/595bc6f00369475047538fdae1ff8cea692ac385/library/core/src/panicking.rs:72:14
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] 2: <futures_util::abortable::Abortable<Fut> as core::future::future::Future>::poll
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] 3: tokio::runtime::task::core::Core<T,S>::poll
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] 4: tokio::runtime::task::harness::Harness<T,S>::poll
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] 5: tokio::task::local::LocalSet::tick
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] 6: <tokio::task::local::RunUntil<T> as core::future::future::Future>::poll
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] 7: tokio::runtime::scheduler::current_thread::Context::enter
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] 8: tokio::runtime::context::set_scheduler
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] 9: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] 10: tokio::runtime::context::runtime::enter_runtime
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] 11: tokio::runtime::runtime::Runtime::block_on
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] 12: tokio_util::task::spawn_pinned::LocalWorkerHandle::run
2024-02-07T17:54:07.636 app[e8247eef726d98] cdg [info] note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] thread 'tokio-runtime-worker' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/leptos_axum-0.6.5/src/lib.rs:332:14:
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] called `Result::unwrap()` on an `Err` value: Canceled
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] stack backtrace:
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 0: rust_begin_unwind
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] at /rustc/595bc6f00369475047538fdae1ff8cea692ac385/library/std/src/panicking.rs:645:5
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 1: core::panicking::panic_fmt
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] at /rustc/595bc6f00369475047538fdae1ff8cea692ac385/library/core/src/panicking.rs:72:14
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 2: core::result::unwrap_failed
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] at /rustc/595bc6f00369475047538fdae1ff8cea692ac385/library/core/src/result.rs:1649:5
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 3: <F as axum::handler::Handler<(M,T1,T2,T3,T4),S>>::call::{{closure}}
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 4: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 5: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 6: <tower::util::map_response::MapResponseFuture<F,N> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 7: <tower::util::oneshot::Oneshot<S,Req> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 8: <axum::routing::route::RouteFuture<E> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 9: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 10: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 11: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 12: <tower::util::map_response::MapResponseFuture<F,N> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 13: <tower::util::oneshot::Oneshot<S,Req> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 14: <axum::routing::route::RouteFuture<E> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 15: <axum_session_auth::service::AuthSessionService<S,User,Type,Sess,Pool> as tower_service::Service<http::request::Request<ReqBody>>>::call::{{closure}}
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 16: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 17: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 18: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 19: <tower::util::map_response::MapResponseFuture<F,N> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 20: <tower::util::oneshot::Oneshot<S,Req> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 21: <axum::routing::route::RouteFuture<E> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 22: <axum_session::service::SessionService<S,T> as tower_service::Service<http::request::Request<ReqBody>>>::call::{{closure}}
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 23: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 24: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 25: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 26: <tower::util::map_response::MapResponseFuture<F,N> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 27: <tower::util::oneshot::Oneshot<S,Req> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 28: <axum::routing::route::RouteFuture<E> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 29: <tower::util::oneshot::Oneshot<S,Req> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 30: hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 31: <hyper::server::conn::http1::UpgradeableConnection<I,S> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 32: <hyper_util::server::conn::auto::UpgradeableConnection<I,S,E> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 33: <axum::serve::Serve<M,S> as core::future::into_future::IntoFuture>::into_future::{{closure}}::{{closure}}
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 34: tokio::runtime::task::core::Core<T,S>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 35: tokio::runtime::task::harness::Harness<T,S>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 36: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 37: tokio::runtime::scheduler::multi_thread::worker::Context::run
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 38: tokio::runtime::context::scoped::Scoped<T>::set
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 39: tokio::runtime::scheduler::multi_thread::worker::run
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 40: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 41: tokio::runtime::task::core::Core<T,S>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 42: tokio::runtime::task::harness::Harness<T,S>::poll
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] 43: tokio::runtime::blocking::pool::Inner::run
2024-02-07T17:54:07.638 app[e8247eef726d98] cdg [info] note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
2024-02-07T17:54:07.638 proxy[e8247eef726d98] cdg [error] could not complete HTTP request to instance: connection closed before message completed
I think it originates here respectively here. this never happens locally but only in a fly.io deployment. Just a regular free-tier deployment, which has no other user than me, so no real load. It's just a server function that uses sqlx to do a postgres query which returns in <100 ms with ~5 rows. It happens on ~50% of page loads. I have a page with two widgets that each sends a similar server function request (to separate server fns), and they independently fail 50% of the time, so one or the other fails most of the time. Interestingly enough, I don't think I ever saw both requests fail at the same time. Also oddly enough, I have another page on that site that sends many more server requests on page load and those never fail.
@Panaetius I'm happy to look into this at some point if you can open a new issue with some way of reproducing the problem.
As a note I haven't seen any way to consistently reproduce this problem but it is causing issues. Originally I saw it when making ~40 requests at the same time to the same server function. Just now it appeared to be happening under very little load.
Added context.
More context on this I added a few strategic logs and found the issue appears to be that the Runtime has been disposed. At /home/glade/code/leptos/leptos_reactive/src/context.rs:257:13, Error getting runtime Runtime RuntimeId(4294967295v1) has been disposed.
Thereafter it ends up calling the RuntimeId::default() after exhausting other options on the code path to get the current runtime. That happens here https://github.com/leptos-rs/leptos/blob/main/leptos_reactive/src/runtime.rs#L143
I don't know enough to debug further at this time but my guess is the runtime that is created by default does not have the context that is expected.
@gbj I would love you any thoughts on how this might be failing or where I might look to debug further. Regardless I will dig deeper myself as well.
@glademiller I am not able to share any useful thoughts without a way to reproduce the issue. As I said above, I am happy to look into this more if anyone can open a new issue with a reproducible example.
@gbj Thank you if I find a way to reproduce the issue consistently I will send it over.
For now for anyone else having this problem if you are not using the ResponseOptions to change a server functions response then the workaround I am using at the moment is to patch leptos-axum by changing this line
https://github.com/leptos-rs/leptos/blob/main/integrations/axum/src/lib.rs#L306
to
let res_options = use_context::
The root of the problem is that in some scenario I haven't been able to identify the Runtime is disposed so a default runtime is returned which does not have the ResponseOptions in the Context as leptos-axum expects.
Edit: I am not sure what is causing this but it is like that some requests work, some others not (kind of random).
Edit2: Removing mimalloc did not help
I am facing the same issue after calling a server fn of an app that runs on gcr.io/distroless/cc-debian12
Docker image with mimalloc
set as allocator.
#[cfg(feature = "ssr")]
pub mod ssr {
use crate::leptos_ui::app::NoCustomError;
use crate::middleware::proxy::Route;
use crate::state::AppState;
use leptos::use_context;
use leptos::ServerFnError;
pub fn get_state_route(
route_name: &str,
) -> Result<(Route, AppState), ServerFnError<NoCustomError>> {
let state: AppState =
use_context::<AppState>().ok_or(ServerFnError::ServerError::<NoCustomError>(
"No server state".to_string(),
))?;
let route = state
.proxy_api
.routes
.get(route_name)
.ok_or(ServerFnError::ServerError::<NoCustomError>(format!(
"Could not extract {} API route",
route_name
)))?;
Ok((route.clone(), state))
}
}
#[server(CasaWorkflowV1Details, "/api_leptos")]
pub async fn call_details(id: i64) -> Result<Details, ServerFnError> {
use crate::api::abc::api::workflow::details;
use http_auth_basic::Credentials;
use crate::leptos_ui::app::ssr::get_state_route;
let (route, state) = get_state_route("myapp")?;
let credentials = Credentials::new(
&route.user.clone().unwrap(),
&route.password.clone().unwrap(),
);
details(id, credentials, route.url.clone(), state.proxy_api.client).await
}
pub async fn details(
id: i64,
creds: Credentials,
base_uri: String,
client: HyperClient,
) -> Result<Details, ServerFnError> {
let uri = url::Url::parse(&format!("{base_uri}/api/abc/v1/{id}"))
.expect("base url")
.to_string();
let req = hyper::Request::builder()
.method(hyper::Method::GET)
.header("Authorization", format!("Basic {}", creds.encode()))
.header("content-type", "application/json")
.uri(uri)
.body(axum::body::Body::empty())?;
let resp = client.request(req).await?;
if !resp.status().is_success() {
return Err(ServerFnError::ServerError(
format!("x returned invalid status code '{}'", resp.status()).to_string(),
));
}
let body = resp.into_body();
let body = body.collect().await?;
let body = body.to_bytes().to_vec();
let debug_str = std::str::from_utf8(&body)?;
info!("response: '{}'", debug_str);
Ok(serde_json::from_slice(&body)?)
}
@glademiller Under which Docker image does your service run?
I noticed that the problem might happen when the serverfn returns a ServerFnError
.
@johnbchron shouldn't the app in that case, that the issue is memory limit related, exit with a OoM error?
Edit: In my case raising the memory did not fix the issue.
@lcmgh Yeah raising the memory didn't help but I later moved from a shared-cpu-1x
to a shared-cpu-2x
and that has fixed it so far. It only happened in the past for me on requests with large payloads, so it might be related to a deadlock under load that only happens when you only have 1 core.
I don't have the bandwidth to build the repro at the moment, but testing with only 1 core is probably a helpful clue.
@glademiller Under which Docker image does your service run?
I am using kubeimages/distroless-cc so distroless is a common denominator. I am also running on ECS Fargate using .25 vCPU so a low powered system to be sure. I don't think I have seen anything in the metrics for the service that would indicate a significant memory or CPU usage spike. That said it does occur for me when I have many simultaneous requests.
Issue also appeared when using debian:bookworm-slim
.
Can confirm that raising the CPU settings from a fraction of 1 to 2 fixed the issue (Kubernetes).
Can confirm that the issue appears when having many (in my case 5-10) concurrent requests. The first time such requests are being made it works fine for most of the time. Subsequent triggers for the same requests lead to more failed ones.
2 CPUs improved the situation, but when refreshing often same error occurs
@gbj I have created a sample app that reproduces the issue. Please see the below repo.
https://github.com/jmsolomon2000/leptos-runtime-dispose-repro
@gbj Thank you if I find a way to reproduce the issue consistently I will send it over.
For now for anyone else having this problem if you are not using the ResponseOptions to change a server functions response then the workaround I am using at the moment is to patch leptos-axum by changing this line
https://github.com/leptos-rs/leptos/blob/main/integrations/axum/src/lib.rs#L306 to let res_options = use_context::().unwrap_or_default().0;
The root of the problem is that in some scenario I haven't been able to identify the Runtime is disposed so a default runtime is returned which does not have the ResponseOptions in the Context as leptos-axum expects.
I can confirm this completely resolves the issue. This has been happening on multiple production apps I've built and released to a server, I never have been able to reproduce this locally.
@gbj this is the issue I brought up in discord regarding the 502 errors through a load balancer. I never did get to the bottom of why that was happening but the above has resolved that also.
@mjarvis9541 @glademiller I'm concerned that the solution (which I believe is now merged in #2468) mostly hides the problem rather than fixing it: i.e., if someone has a server function that actually sets something in ResponseOptions
, unwrap_or_default()
(or printing a warning as #2468 does) doesn't panic but returns a response and fails to set that header/status code/etc. that was supposed to be set in the response. I guess it's better than the panic.
I will plan to revisit this to make sure it is solved in 0.7, where the reactive system works significantly differently.
@gbj I agree my "fix" is a workaround not a solution. That is why I qualified my comment above with "if you are not using the ResponseOptions to change a server functions response then the". The real solution is to find out what is causing the Runtime to be disposed and fixing that. In my case I don't ever change the ResponseOptions at the moment so the workaround is okay but I don't think it should be merged since it just changes the bug into being an issue with ResponseOption changes not being respected in this scenario.
@gbj @glademiller I'm wondering if this has something to do with the fallback route and providing the correct context upon the server page load. Given:
ResponseOptions is provided via context when you use most of the handlers provided in this crate, including .leptos_routes, .leptos_routes_with_context, handle_server_fns, etc. You can find the full set of provided context types in each handler function.
If you provide your own handler, you will need to provide ResponseOptions via context yourself if you want to access it via context.
I recall when using the custom server fn handler and custom leptos route handler the same context was required to be provided for both.
I've noticed some strange behaviour with routing recently on how it's handling trailing slashes, :? routes; thinking there are duplicate routes and throwing a panic when there are not, and generally getting confused at times.
The server was throwing panics when it couldn't access the pg pool via context on routes with trailing slash at the end.
I have quite a large application with many routes, optional path params, mutliple nested params, all that fun stuff, so I'm having a great time trying to debug.
The following setup seems to have stopped the panic from the server requests on paths with trailing slash and not finding the pg pool from context.
However I'm unable to use render_route_with_context
in the fallback to provide Vec<RouteListing>
which I think may be required.
Any thoughts on this?
#[tokio::main]
async fn main() {
dotenvy::dotenv().unwrap();
let database_url = env::var("DATABASE_URL").expect("DATABASE_URL should be set");
let pool = PgPoolOptions::new()
.max_connections(5)
.connect(&database_url)
.await
.expect("could not create a database pool");
let conf = get_configuration(None).await.unwrap();
let leptos_options = conf.leptos_options;
let addr = leptos_options.site_addr;
let routes = generate_route_list(App);
let state = Arc::new(AppState { pool: pool.clone() });
let context = move || provide_context(pool.clone());
let app = Router::new()
.nest_service("/pkg", ServeDir::new("target/site/pkg"))
.route_service("/favicon.ico", ServeFile::new("target/site/favicon.ico"))
.leptos_routes_with_context(&leptos_options, routes, context, App)
.fallback(leptos_fallback)
.layer(Extension(state))
.with_state(leptos_options);
let listener = tokio::net::TcpListener::bind(&addr).await.unwrap();
axum::serve(listener, app).await.unwrap();
async fn leptos_fallback(
State(options): State<LeptosOptions>,
Extension(state): Extension<Arc<AppState>>,
req: Request<Body>,
) -> impl IntoResponse {
let context = move || provide_context(state.pool.clone());
let handler = leptos_axum::render_app_to_stream_with_context(options, context, App);
handler(req).await.into_response()
}
For reference: render_route_with_context:
Returns an Axum Handler that listens for a GET request and tries to route it using [leptos_router], serving an HTML stream of your application. It allows you to pass in a context function with additional info to be made available to the app The difference between calling this and render_app_to_stream_with_context() is that this one respects the SsrMode on each Route, and thus requires Vec
for route checking. This is useful if you are using .leptos_routes_with_handler().
@benwis
This one has plagued us for a while, we've never been able to reliably reproduce it enough for testing. The idea that multiple cores helps it seems to be a patch/workaround.
I'm hoping/believing that this is fixed in 0.7, which we should get a sense of as more people test the beta
This use_context
/.unwrap()
are not present on 0.7/the main branch. I am going to close this issue again — If anyone who has updated to the latest 0.7 releases finds that this is still occurring, please open a new issue with reproduction.
Describe the bug In leptox_axum handle_server_fns_inner there are several call to unwrap() that can evidently be None (though I'm not entirely sure how).
Leptos Dependencies
To Reproduce Steps to reproduce the behavior: You'll have to bear with me here because I can only reproduce this when stress testing.
./wrk -t 8 -c 2000 -d 30s --latency http://localhost:3000
error reaching server to call server function: TypeError: NetworkError when attempting to fetch resource.
.Expected behavior Option::unwrap() call is removed and replaced with a guard. Though I'm not sure what is causing this problem.
Screenshots If applicable, add screenshots to help explain your problem.
Additional context If you look at the handler you can clearly see that provide_context is called early on so there's no reason to suspect that getting the context for ResponseOptions wouldn't return Some, but it does). My guess bet is that there's reads and writes going on at the same time.