Open rotu opened 1 month ago
We actually error when the command doesn't exist on the machine: https://github.com/moonrepo/proto/blob/master/crates/warpgate/src/host.rs#L165
So I'm not entirely sure git missing was the problem here.
I can see now that uninstalling git and trying again, I get the more comprehensible error:
Error: plugin::call_func::failed
× Command or script git does not exist. Unable to execute from plugin.
So I'm not sure what the original issue was either and I can't repeat it as such.
Maybe it had something to do with this. It looks like maybe we're qualifying pwsh
by its full path and extension but not powershell
:
https://github.com/moonrepo/proto/blob/fbba5f20e540a86308ac8db6ba026d7501d3f940/crates/system-env/src/helpers.rs#L107-L108
Hrmm that's even more weird. Anytime I see the "A Tokio 1.x context was found, but it is being shutdown." error, I always assume Rust async just crashed in some weird way. I'm not even sure how to reproduce that.
I think it means that we're ending the program with some task is still in process.
Here's the relevant doc afaict. Hopefully, it makes more sense to you: https://docs.rs/tokio/latest/tokio/runtime/struct.Handle.html#method.block_on
Runs a future to completion on this Handle’s associated Runtime.
This runs the given future on the current thread, blocking until it is complete, and yielding its resolved result. Any tasks or timers which the future spawns internally will be executed on the runtime.
When this is used on a current_thread runtime, only the Runtime::block_on method can drive the IO and timer drivers, but the Handle::block_on method cannot drive them. This means that, when using this method on a current_thread runtime, anything that relies on IO or timers will not work unless there is another thread currently calling Runtime::block_on on the same runtime.
If the runtime has been shut down
If the Handle’s associated Runtime has been shut down (through Runtime::shutdown_background, Runtime::shutdown_timeout, or by dropping it) and Handle::block_on is used it might return an error or panic. Specifically IO resources will return an error and timers will panic. Runtime independent futures will run as normal.
Panics
This function panics if the provided future panics, if called within an asynchronous execution context, or if a timer future is executed on a runtime that has been shut down.
In particular, I don't understand how exec_command
and send_request
in warpgate host.rs
and call
in plugin.rs
let the caller know that subprocess is complete (whether succeeded or failed) and it's safe to shut down now.
In terms of the logging above, I see a lot of "Calling host function" and not a lot of "Called host function". To make sense of the trace, (1) these should have some correlation ID unique to each call of exec_command
so you can tell which end corresponds to which start (2) if the block ends with a premature return using ?
, there's no way to know whether the task is still in flight.
Maybe some of the above is what #[instrument(name = "host_func_exec_command", skip_all)]
is for and the issue template/troubleshooting tools should recommend using --dump
or PROTO_DUMP
? I wish I had gotten that output when I could reproduce the bug!
In particular, I don't understand how exec_command and send_request in warpgate host.rs and call in plugin.rs let the caller know that subprocess is complete (whether succeeded or failed) and it's safe to shut down now.
I agree, there's non-ideal layers here. However, extism (the plugin layer) is synchronous, so in this context, we don't need to wait for anything in the context of tokio/async. So if anything in extism errors, it just bubbles up.
The exception to this is send_request
, which must spawn a blocking task in tokio, because reqwest
requires tokio. This requirement is super annoying, but no other work around until extism supports async.
In terms of the logging above, I see a lot of "Calling host function" and not a lot of "Called host function". To make sense of the trace, (1) these should have some correlation ID unique to each call of exec_command so you can tell which end corresponds to which start (2) if the block ends with a premature return using ?, there's no way to know whether the task is still in flight.
Is the plugin
field not enough data? That's already a UUID.
Maybe some of the above is what #[instrument(name = "host_func_exec_command", skip_all)] is for and the issue template/troubleshooting tools should recommend using --dump or PROTO_DUMP? I wish I had gotten that output when I could reproduce the bug!
These labels were in the logs before but it added sooo much noise that it became hard to read. So yeah, right now they are just in the dumps. I can maybe add an env var or something that toggles them on.
Added --log verbose
in v0.41.5
I think I might have experienced similar, but with proto install
which I understand replaces proto use
. I got the same error, so I tried installing each tool individually instead, in the form proto install rust 1.78.0
etc (for each tool and version in my .prototools
file. I had the odd failure of node
not installing, for the same reason as #615 actually - where an HTTP request failed. Another due to my own plugin not supporting ARM on Linux (which I aim to resolve separately - but for now will comment out).
So I wonder is this error because proto install
spawns off multiple async processes to install each tool, and if one tool fails to install for one of the above reasons, then we're seeing this error?
I'll do some more investigataion myself, but wanted to braindump my initial findings in case it helps.
So I wonder is this error because proto install spawns off multiple async processes to install each tool, and if one tool fails to install for one of the above reasons, then we're seeing this error?
Yeah. Anytime a command has to process multiple tools, either loading or executing something, it's always done in parallel, 1 tool per 1 thread.
I try to provide human-readable error messages everywhere I can, so if something seems cryptic, or a panic occurs, that's usually somewhere outside of proto, most likely in a 3rd party library or the WASM layer. These are a bit harder to track down, especially WASM.
I don't see anything like TaskTracker
and CancellationToken
in the proto
code. Here's how the tokio docs recommend shutting down: https://tokio.rs/tokio/topics/shutdown
I use JoinSet
s, which are very similar: https://docs.rs/tokio/latest/tokio/task/struct.JoinSet.html
I'm not sure a cancellation token would even help here with the panic in the original post.
I use
JoinSet
s, which are very similar: https://docs.rs/tokio/latest/tokio/task/struct.JoinSet.html
It even has a comparison to JoinSet! Gosh I'm jealous of the Rust ecosystem and its documentation!
I'm not sure a cancellation token would even help here with the panic in the original post.
I think I need a trace catching the panic red-handed :-/. My backtrace is unfortunately not very useful. I'm reinstalling proto with cargo install --git https://github.com/moonrepo/proto.git proto_cli --debug --root $HOME/.proto --force
to hopefully catch a better backtrace next time!
So digging into potential failures when installing a tool. I'm currently testing this on Windows, and when trying to install all tools, I was getting the panic and failure, so again, I try and install each tool individually. I just received this:
proto install node 20.12.2
Installing Node.js
Downloading pre-built archive
Error: net::http
× Failed to make HTTP request for https://nodejs.org/download/release/
│ v20.12.2/node-v20.12.2-win-x64.zip.
├─▶ error decoding response body
├─▶ error reading a body from connection
╰─▶ stream error received: unexpected internal error encountered
It might be the case that this is a different error to when I tried to just run proto install
- but is there a reason why I don't get friendly errors currently, and only the tokio panic? It would at least be helpful when running proto install
to know which tool failed, but perhaps, continue installing the others anyway?
Will continue to add any extra information I find - and also happy to help in any way I can. Currently trying to get some self-hosted GitHub Action runners setup across all platforms (macOS, Windows and Linux/Ubuntu), and proto will make this so much easier 🙏 - I really appreciate this tool and all it offers!
Yeah at least for proto install
, I can look into not failing the entire install if 1 tool fails.
FYI the underlying connection issue I was encountering was down to DNS within a Windows VM: https://github.com/moonrepo/proto/issues/615#issuecomment-2422395741
But aside from that, the improvements to the error messages you mentioned above would be a great addition.
What version?
0.41.3
Which command?
proto use
What happened?
I installed
proto
on a recently re-installed machine. Git is not yet installed.proto use
crashes with a confusing error that doesn't explain the reason I expect the error message to mention the failing commandgit
.The output of
proto use
:Note the "help" text which suggests setting
RUST_BACKTRACE=1
(I wonder if there's a way to mention setting the--log=trace
orPROTO_LOG=trace
in addition or instead?)Trace logs?
Operating system?
Windows
Architecture?
x64