Open award28 opened 1 year ago
I've been successful in creating a proof of concept. I'll breakdown the different requirements in order to get this working and how we can achieve cross-platform execution. You can find my branch with my poc progress here. Note that if you want to pull this down and run it (not recommended as of this writing, it's a mess😅), you'll need to copy the read_file
function mentioned below into the generated core javascript package.
Action validator is heavily integrated with clap
. clap
has some fantastic abstractions that make it a really good choice for a rust-built CLI. At the same time, because of these abstractions, we need to understand the clap::Parser
innards in order to replace some of the default behaviors with wasm-friendly alternatives.
std::env::args_os()
is rust's builtin function for reading arguments. As described by the docstring, this function:
Returns the arguments that this program was started with (normally passed via the command line).
This returns a std::env::ArgsOs
struct, but for our purposes all we care about is that it is an Iterable
which can be converted into an OsString
. This iterable is what gets passed into the clap::command
. So for our purposes, we can replace the value of std::env::ArgsOs
with an iterable that satisfies those conditions; thankfully, a Vec<String>
converted into an iterable satisfies these conditions as well.
This means we can pass a list of strings to the javascript entrypoint function and have the CliConfig parse those instead! There is a catch here, however, as the clap::command
doesn't simply return when a specific flag was seen. Instead, the flag is consumed behind the scenes and the program is exited. At this point, the javascript wasm build would raise an Unreachable
error.
--help
and --version
Flag Results ConsumableWith the default behavior for ArgAction::Help
and ArgAction::Version
, the CliConfig::parse()
function will short-circuit and print to the console. Clap treats the responses of these actions as a Result::Err
. In order to work around this but still keep the same output, we need to capture the parsing response lower in the clap stack.
To maintain the existing functionality (and any future clap
defaults), we don't want to manually re-write the default behavior for each builtin flag. Instead, we want to capture the output of that flag, and send that as the response for this invocation. To do this, we need to understand how clap
handles these flags.
When clap
runs into a "special" flag, it treats it as a Result::Err
in a Result
response. The value in that error has an exit
method, which will format the error, print the error, and exit the program. The second step is where we want to intercept this exchange and capture the formatted error.
// src/main.rs
use action_validator::CliConfig;
fn main() {
let res = CliConfig::parse_itr(std::env::args_os());
...
}
// src/config.rs
use std::ffi::OsString;
use clap::{CommandFactory, FromArgMatches, Error};
impl CliConfig {
fn format_error<I: CommandFactory>(err: Error) -> Error {
let mut cmd = I::command();
err.format(&mut cmd)
}
pub fn parse_itr<I, T>(itr: I) -> Result<Self, String>
where
I: IntoIterator<Item = T>,
T: Into<OsString> + Clone,
{
let arg_matches = <Self as CommandFactory>::command()
// Get the matching argument actions to build the CliConfig
.try_get_matches_from_mut(itr);
if let Err(e) = arg_matches {
return Err(format!("{e}"));
}
let mut matches = arg_matches.unwrap();
let res = <Self as FromArgMatches>::from_arg_matches_mut(&mut matches)
// Capture the builtin flags output
.map_err(Self::format_error::<Self>);
match res {
// If no errors occurred, capture the CliConfig instance
Ok(s) => Ok(s),
// This is a developer time exception - we may want to handle this one differently
Err(e) => Err(format!("{e}")),
}
}
}
As of this writing, all of the entrypoint functionality for action-validator
is being manually written in javascript, which then calls the core functionality lower in the action-validator stack. Instead, we can pass action-validator
a list of process arguments which will then be invoked with our above implementation of parse_itr
. In node, we can use the process.argv
var in order to get cli arguments. For node, the first argument in the list is always the node binary, and the second is the executed js file. To match the format of the rust cli arguments, we'll want a slice of this list. Namely, process.argv.slice(1)
.
If there are any errors, they will be written to stderr. Otherwise, nothing is written and the process will exit according to the response specified exit code.
// packages/cli/cli.mjs
#!/usr/bin/env node
// @ts-check
import * as actionValidator from "@action-validator/core";
let response = actionValidator.entrypoint(
JSON.stringify({"args": process.argv.slice(1)})
);
if (response.errors) {
console.error(response.errors);
}
process.exit(response.exit_code);
Because rust is unaware of the underlying OS platform when invoked through wasm, rust can't read files. Because of this, we need to externally accept a file reading function, and use that in order to get the contents. We also need to provide an abstraction for this reader in order to return rust-wasm compatible values.
// src/lib.rs
#[wasm_bindgen]
extern "C" {
#[wasm_bindgen(catch)]
fn read_file(path: &str) -> Result<String, JsValue>;
}
pub fn run_path(config: &CliConfig, path: &PathBuf) -> Response {
let file_name = match path.file_name() {
Some(file_name) => file_name.to_str(),
None => {
return Response {
errors: Some(format!("Unable to derive file name from src!")),
exit_code: 1,
};
}
};
let src = &match read_file(path.to_str().unwrap()) {
Ok(src) => src,
...
TODO: Need to move this to a non-generated file.
// package/core/action_validator.js
var fs = require('fs');
function read_file(path) {
return fs.readFileSync(path, { encoding: "utf8" });
}
There are a few changes which will be required in order to make an input/output high-level interface for action-validator
itself.
js
feature is enabled. run
command returns a value rather than writing anything to stderr/stdout.We can use the js
feature flag in order to execute with either rust (disabled; executed through src/main.rs
) or javascript (enabled; executed through packages/cli/cli.mjs
).
With the above code in place (and some other minor parts not mentioned above), I was able to run all flags through action-validator and handle all response errors.
npx action-validator --verbose .github/workflows/audit.yml
Treating audit.yml as a Workflow definition
WARNING: Glob validation is not yet supported. Glob at /on/push/paths will not be validated.
❯ npx action-validator -V
action-validator 0.0.0-git
❯ npx action-validator --version
action-validator 0.0.0-git
❯ npx action-validator --help
A validator for GitHub Action and Workflow YAML files
Usage: action-validator [OPTIONS] [path_to_action_yaml]...
Arguments:
[path_to_action_yaml]... Input file
Options:
-v, --verbose Be more verbose
-h, --help Print help information
-V, --version Print version information
@bcheidemann Do you know if this would help with the glob validation issues we're having?
@mpalmer would like to get your thoughts on the above implementation. If this sounds good to you, I can get started on this after the remote-checks
feature.
@award28 thanks for the detailed write up! 😃
One of the solutions for glob validation I considered was exposing functions from JS - much like you've done to support reading files from Rust. So I think this shows it can be done and gives a good template for how to do it well.
The main limitation I see is that it may be difficult to bend some crates we (do|will in future) depend on to our will. Where this is possible, it may require some poking about to figure out how to use these crates in a way which works for WASM (like you've done with clap).
The way I see it, this is probably a limitation we can work around if needed. But I am still hoping we can find a more general solution in WASI or via the Node-API.
Well, nobody can say that you haven't thought deeply about the problem space, and investigated it thoroughly... this is an epic writeup, thank you, it really helps to clarify how you're approaching the problem.
I think you're on the right track with having a common CLI parser/handler, and then calling that from Rust and JS with the argument list from the command line. My intuition is that this will end up with the least platform-specific code, which I'm inclined to think is going to be the best long-term approach.
I'm hopeful that WASI (or similar) will help with the file I/O conundrum, but absent that, a per-platform read_file
function isn't a showstopper. Hopefully there's a WASM-friendly "fetch this URL" crate that you can use for the action checking, though, otherwise there's likely to end up being a whole bunch of per-platform callbacks by the time you're done with that...
You both bring up excellent points. If WASI (or other) supports the necessary connective tissue to have platform-independent system access, none of this is necessary (A huge win). If the only necessary part is reading the cli arguments, and all of the other system access works, this would lead to a very slim per-platform executable.
Side Note: I'm wondering if this per-platform entrypoint will be required regardless, or if WASI can build a per-platform executable if we provide it with main.rs
. For example, would rusts std::env::args_os()
retrieve the correct CLI arguments? Or would the first argument be the node executable (resulting in failure)?
I think you're on the right track with having a common CLI parser/handler, and then calling that from Rust and JS with the argument list from the command line. My intuition is that this will end up with the least platform-specific code, which I'm inclined to think is going to be the best long-term approach. - @mpalmer
Yeah, this was definitely one of the goals with this work. My main concern was achieving 1:1 functionality with the native binary, but less platform-dependent boilerplate is absolutely the way to go.
The main limitation I see is that it may be difficult to bend some crates we (do|will in future) depend on to our will. Where this is possible, it may require some poking about to figure out how to use these crates in a way which works for WASM (like you've done with clap). - @bcheidemann
I'm hopeful that WASI (or similar) will help with the file I/O conundrum, but absent that, a per-platform read_file function isn't a showstopper. Hopefully there's a WASM-friendly "fetch this URL" crate that you can use for the action checking, though, otherwise there's likely to end up being a whole bunch of per-platform callbacks by the time you're done with that... - @mpalmer
This is not ideal; I was deep in the clap code for a while before I finally understood how all of the gears spun. If we needed to do this for every new feature which required some system access, this would be a massive pain and unmaintainable. Imagine we're 10 platforms down the line, and then we need to find an approach that works for all 10 platforms 🫠
Imagine we're 10 platforms down the line, and then we need to find an approach that works for all 10 platforms
:scream_cat:
On the other hand, I can't think of many other platforms that are as weird and constrained as WASM, that Rust supports or is likely to support. The only other contender off the top of my head is nostd
/noalloc
embedded systems, and (a) they're unlikely to want/need a GitHub Actions validation CLI (although never say never), but also (b) so few crates support those constraints that I suspect people who code for them are pretty much resigned to having to hand-code everything themselves anyway.
Side Note: I'm wondering if this per-platform entrypoint will be required regardless, or if WASI can build a per-platform executable if we provide it with
main.rs
. For example, would rustsstd::env::args_os()
retrieve the correct CLI arguments? Or would the first argument be the node executable (resulting in failure)?
I was able to compile main.rs
using the wasm32-wasi
target and execute that from Node without the need for a platform specific entrypoint. But I hit some issues with WASI around exposing an API to Node and with the glob validation. I haven't given up on WASI yet since it has a lot of upside, but I'm starting to wonder if it maybe better to use the Node-API to expose functionality to Node instead, since this is much more mature. In this case, we would still need this entrypoint.
On the other hand, I can't think of many other platforms that are as weird and constrained as WASM
@mpalmer ah I may be misusing terminology - I meant more similar to something like node
. I wasn't sure if we would want to make this installable through something like deno, python, etc.
I was able to compile main.rs using the wasm32-wasi target and execute that from Node without the need for a platform specific entrypoint.
@bcheidemann This is huge! Super excited that this is even possible. I started doing a bit of research into WASI after reading the proposal issue you raised and I'm not sure that this will be much of a problem once WASI hits stage 3. In the current stage, WASI is still working on the system interfaces for the different points of system access (network I/O, disk I/O, etc.). From what I understand, WASI will have these system access APIs available in stage 3. I'm not sure when stage 3 will happen, but we could always use this approach for the time being and migrate to WASI when the time is right.
I'm not sure when stage 3 will happen, but we could always use this approach for the time being and migrate to WASI when the time is right.
Yeah that's true! I may look into the relative effort of migrating what we have to N-API vs "fixing" glob validation for WASM. I think it should be a relatively minor change to the Rust code and might save us some headaches. If it's a comparable amount of effort, maybe N-API would be a suitable stop-gap while we wait for WASI to mature. What do you think?
I think that makes sense! I was originally hoping that we could use wasm_bindgen
for something like pip as well to make action validator more accessible, but I misunderstood that this is specific for JavaScript.
If we did want to make action-validator pip installable, we could use https://github.com/PyO3/pyo3. I'm not sure if we would need to abstract the entry point and update the output (I'll look into this if we're interested). What do y'all think?
I think that makes sense! I was originally hoping that we could use
wasm_bindgen
for something like pip as well to make action validator more accessible, but I misunderstood that this is specific for JavaScript.If we did want to make action-validator pip installable, we could use https://github.com/PyO3/pyo3. I'm not sure if we would need to abstract the entry point and update the output (I'll look into this if we're interested). What do y'all think?
@award28 that looks interesting! If I understand correctly pyo3
would support the standard library and therefore most crates action-validator
would rely on? I guess if this is a case it would be a fiarly thin abstraction over the native rust implementation 🤔
Packaging action-validator
-the-CLI as a Python, Ruby, etc package doesn't make a lot of sense, as either it's shipped as a source package, in which case you'd need a Rust compiler anyway (and if you've got a Rust compiler, you can just build the pure-Rust CLI), or it's shipped as a binary package, which requires more work on our part (to build the binary packages for all the various packaging formats) and is no better from a trustworthiness perspective than just telling people to download the straight Rust binary from the Releases page.
The only situation in which it would make sense to produce bindings for other languages (such as via Pyo3, Rutie, etc) is if people want to call into the action-validator
library from their own code written in Python, Ruby, etc. As you say, @bcheidemann, that library would have at its disposal the full Rust stdlib, including file handling, networking, and all the rest of the things we might take for granted, and as such would not be an impediment to our use of whatever crates we feel are useful.
@bcheidemann
if this is a case it would be a fiarly thin abstraction over the native rust implementation
That's the way I understood it as well! While it is interesting, I think @mpalmer brings up a really good reason why this shouldn't be done.
...and is no better from a trustworthiness perspective than just telling people to download the straight Rust binary from the Releases page.
That's a great point! I agree, I don't see any reason why it would be necessary from this perspective.
The only situation in which it would make sense to produce bindings for other languages (such as via Pyo3, Rutie, etc) is if people want to call into the action-validator library from their own code
Interesting - is this something people have done in the past with action-validator
? I agree, this would be the only situation where it makes sense to produce bindings.
The only situation in which it would make sense to produce bindings for other languages (such as via Pyo3, Rutie, etc) is if people want to call into the action-validator library from their own code
Interesting - is this something people have done in the past with action-validator? I agree, this would be the only situation where it makes sense to produce bindings.
Yes, this use case was my initial motivation for adding WASM/NPM support 🙂
We have a CLI tool at work, which does many things and is installable via an internal NPM registry. It's raison d'être is to ensure consistency between projects and it handles stuff like formatting, linting, running tests etc. I wanted to add support for validating GitHub actions to it but since the CLI tool is distributed as an NPM package and because, it's preferable to also install something like action-validator
via NPM. Technically, we could also use @action-validator/cli
but it is more convenient to use the API directly for our usecase.
This makes perfect sense, and since you had the core
package already created, there was no reason not to create the cli
package.
With that in mind, I would put my support behind N-API until WASI stabilizes (assuming it provides 1:1 mapping with the native binary).
@bcheidemann Since N-API provides 1:1 mapping, should we close this issue?
@award28 I guess N-API would still require the entrypoint function you discussed above to handle the arguments for multiple files (and everything else) correctly so I think we should keep it open for now until that's implemented. What do you think?
Ah I understand - if that's the case then what benefits do we gain for switching from wasm_bindgen to N-API? Is it the Glob validation?
Ah I understand - if that's the case then what benefits do we gain for switching from wasm_bindgen to N-API? Is it the Glob validation?
At present, it's the glob validation. But the future benefit would be that most new features we add would automatically be available in the NPM package without additional work. Currently that wouldn't be the case for anything relying on system access.
Background
As per the discussion with @bcheidemann in https://github.com/mpalmer/action-validator/pull/36#issuecomment-1464698569, this issue will track the necessary work to make the @action-validator/cli npm package behave equivalently to the native binary in terms of multi-file input support.
Scope of Work
Acceptance Criteria
When
action-validator
is installed through npm, multiple files can be passed as input.Example of Native Binary Behavior