mpalmer / action-validator

Tool to validate GitHub Action and Workflow YAML files
GNU General Public License v3.0
271 stars 25 forks source link

Multi-file support for @action-validator/cli #41

Open award28 opened 1 year ago

award28 commented 1 year ago

Background

As per the discussion with @bcheidemann in https://github.com/mpalmer/action-validator/pull/36#issuecomment-1464698569, this issue will track the necessary work to make the @action-validator/cli npm package behave equivalently to the native binary in terms of multi-file input support.

Scope of Work

Acceptance Criteria

When action-validator is installed through npm, multiple files can be passed as input.

Example of Native Binary Behavior

❯ cargo run -F remote-checks -- .github/workflows/audit.yml .github/workflows/qa.yml .github/workflows/release.yml
    Finished dev [unoptimized + debuginfo] target(s) in 0.07s
     Running `target/debug/action-validator .github/workflows/audit.yml .github/workflows/qa.yml .github/workflows/release.yml`
Fatal error validating .github/workflows/audit.yml
Validation failed: ValidationState {
    action_type: Some(
        Workflow,
    ),
    file_path: Some(
        ".github/workflows/audit.yml",
    ),
    errors: [
        Unknown {
            code: "action_not_found",
            detail: Some(
                "Could not find action: actions/checkout@v",
            ),
            path: "jobs/security_audit/steps/uses/actions/checkout@v",
            title: "Action Not Found",
        },
    ],
}
Fatal error validating .github/workflows/qa.yml
Validation failed: ValidationState {
    action_type: Some(
        Workflow,
    ),
    file_path: Some(
        ".github/workflows/qa.yml",
    ),
    errors: [
        Unknown {
            code: "action_not_found",
            detail: Some(
                "Could not find action: actions/checkout@v",
            ),
            path: "jobs/actions/steps/uses/actions/checkout@v",
            title: "Action Not Found",
        },
    ],
}
award28 commented 1 year ago

1:1 JavaScript Functionality Proof of Concept Notes

I've been successful in creating a proof of concept. I'll breakdown the different requirements in order to get this working and how we can achieve cross-platform execution. You can find my branch with my poc progress here. Note that if you want to pull this down and run it (not recommended as of this writing, it's a mess😅), you'll need to copy the read_file function mentioned below into the generated core javascript package.

Clap

Action validator is heavily integrated with clap. clap has some fantastic abstractions that make it a really good choice for a rust-built CLI. At the same time, because of these abstractions, we need to understand the clap::Parser innards in order to replace some of the default behaviors with wasm-friendly alternatives.

Overriding the Default Arguments Input

std::env::args_os() is rust's builtin function for reading arguments. As described by the docstring, this function:

Returns the arguments that this program was started with (normally passed via the command line).

This returns a std::env::ArgsOs struct, but for our purposes all we care about is that it is an Iterable which can be converted into an OsString. This iterable is what gets passed into the clap::command. So for our purposes, we can replace the value of std::env::ArgsOs with an iterable that satisfies those conditions; thankfully, a Vec<String> converted into an iterable satisfies these conditions as well.

This means we can pass a list of strings to the javascript entrypoint function and have the CliConfig parse those instead! There is a catch here, however, as the clap::command doesn't simply return when a specific flag was seen. Instead, the flag is consumed behind the scenes and the program is exited. At this point, the javascript wasm build would raise an Unreachable error.

Making the --help and --version Flag Results Consumable

With the default behavior for ArgAction::Help and ArgAction::Version , the CliConfig::parse() function will short-circuit and print to the console. Clap treats the responses of these actions as a Result::Err. In order to work around this but still keep the same output, we need to capture the parsing response lower in the clap stack.

To maintain the existing functionality (and any future clap defaults), we don't want to manually re-write the default behavior for each builtin flag. Instead, we want to capture the output of that flag, and send that as the response for this invocation. To do this, we need to understand how clap handles these flags.

When clap runs into a "special" flag, it treats it as a Result::Err in a Result response. The value in that error has an exit method, which will format the error, print the error, and exit the program. The second step is where we want to intercept this exchange and capture the formatted error.

Code Example

// src/main.rs
use action_validator::CliConfig;

fn main() {
    let res = CliConfig::parse_itr(std::env::args_os());
   ...
}

// src/config.rs
use std::ffi::OsString;
use clap::{CommandFactory, FromArgMatches, Error};

impl CliConfig {
    fn format_error<I: CommandFactory>(err: Error) -> Error {
        let mut cmd = I::command();
        err.format(&mut cmd)
    }

    pub fn parse_itr<I, T>(itr: I) -> Result<Self, String>
        where
            I: IntoIterator<Item = T>,
            T: Into<OsString> + Clone,
    {
        let arg_matches = <Self as CommandFactory>::command()
            // Get the matching argument actions to build the CliConfig
            .try_get_matches_from_mut(itr);

        if let Err(e) = arg_matches {
            return Err(format!("{e}"));
        }
        let mut matches = arg_matches.unwrap();
        let res = <Self as FromArgMatches>::from_arg_matches_mut(&mut matches)
            // Capture the builtin flags output
            .map_err(Self::format_error::<Self>);
        match res {
            // If no errors occurred, capture the CliConfig instance
            Ok(s) => Ok(s),
            // This is a developer time exception - we may want to handle this one differently
            Err(e) => Err(format!("{e}")),
        }
    }
}

JavaScript

Passing CLI Arguments

As of this writing, all of the entrypoint functionality for action-validator is being manually written in javascript, which then calls the core functionality lower in the action-validator stack. Instead, we can pass action-validator a list of process arguments which will then be invoked with our above implementation of parse_itr. In node, we can use the process.argv var in order to get cli arguments. For node, the first argument in the list is always the node binary, and the second is the executed js file. To match the format of the rust cli arguments, we'll want a slice of this list. Namely, process.argv.slice(1).

If there are any errors, they will be written to stderr. Otherwise, nothing is written and the process will exit according to the response specified exit code.

Code Example

// packages/cli/cli.mjs
#!/usr/bin/env node
// @ts-check

import * as actionValidator from "@action-validator/core";

let response = actionValidator.entrypoint(
  JSON.stringify({"args": process.argv.slice(1)})
);

if (response.errors) {
  console.error(response.errors);
}
process.exit(response.exit_code);

Reading File Contents

Because rust is unaware of the underlying OS platform when invoked through wasm, rust can't read files. Because of this, we need to externally accept a file reading function, and use that in order to get the contents. We also need to provide an abstraction for this reader in order to return rust-wasm compatible values.

Code Example

  // src/lib.rs
    #[wasm_bindgen]
    extern "C" {
        #[wasm_bindgen(catch)]
        fn read_file(path: &str) -> Result<String, JsValue>; 
    }

    pub fn run_path(config: &CliConfig, path: &PathBuf) -> Response {
        let file_name = match path.file_name() {
            Some(file_name) => file_name.to_str(),
            None => {
                return Response {
                    errors: Some(format!("Unable to derive file name from src!")),
                    exit_code: 1,
                };
            }
        };

        let src = &match read_file(path.to_str().unwrap()) {
            Ok(src) => src,
...

TODO: Need to move this to a non-generated file.

// package/core/action_validator.js

var fs = require('fs');

function read_file(path) {
    return fs.readFileSync(path, { encoding: "utf8" });
}

Action Validator

There are a few changes which will be required in order to make an input/output high-level interface for action-validator itself.

  1. Modify the run command to return a result rather than writing anything to stderr/stdout.
  2. Change all of the print statements to either:
    1. A wasm-bound function enabled when the js feature is enabled.
    2. The run command returns a value rather than writing anything to stderr/stdout.

We can use the js feature flag in order to execute with either rust (disabled; executed through src/main.rs) or javascript (enabled; executed through packages/cli/cli.mjs).


With the above code in place (and some other minor parts not mentioned above), I was able to run all flags through action-validator and handle all response errors.

 npx action-validator --verbose .github/workflows/audit.yml
Treating audit.yml as a Workflow definition
WARNING: Glob validation is not yet supported. Glob at /on/push/paths will not be validated.

❯ npx action-validator -V
action-validator 0.0.0-git

❯ npx action-validator --version
action-validator 0.0.0-git

❯ npx action-validator --help
A validator for GitHub Action and Workflow YAML files

Usage: action-validator [OPTIONS] [path_to_action_yaml]...

Arguments:
  [path_to_action_yaml]...  Input file

Options:
  -v, --verbose  Be more verbose
  -h, --help     Print help information
  -V, --version  Print version information
award28 commented 1 year ago

@bcheidemann Do you know if this would help with the glob validation issues we're having?

@mpalmer would like to get your thoughts on the above implementation. If this sounds good to you, I can get started on this after the remote-checks feature.

bcheidemann commented 1 year ago

@award28 thanks for the detailed write up! 😃

One of the solutions for glob validation I considered was exposing functions from JS - much like you've done to support reading files from Rust. So I think this shows it can be done and gives a good template for how to do it well.

The main limitation I see is that it may be difficult to bend some crates we (do|will in future) depend on to our will. Where this is possible, it may require some poking about to figure out how to use these crates in a way which works for WASM (like you've done with clap).

The way I see it, this is probably a limitation we can work around if needed. But I am still hoping we can find a more general solution in WASI or via the Node-API.

mpalmer commented 1 year ago

Well, nobody can say that you haven't thought deeply about the problem space, and investigated it thoroughly... this is an epic writeup, thank you, it really helps to clarify how you're approaching the problem.

I think you're on the right track with having a common CLI parser/handler, and then calling that from Rust and JS with the argument list from the command line. My intuition is that this will end up with the least platform-specific code, which I'm inclined to think is going to be the best long-term approach.

I'm hopeful that WASI (or similar) will help with the file I/O conundrum, but absent that, a per-platform read_file function isn't a showstopper. Hopefully there's a WASM-friendly "fetch this URL" crate that you can use for the action checking, though, otherwise there's likely to end up being a whole bunch of per-platform callbacks by the time you're done with that...

award28 commented 1 year ago

You both bring up excellent points. If WASI (or other) supports the necessary connective tissue to have platform-independent system access, none of this is necessary (A huge win). If the only necessary part is reading the cli arguments, and all of the other system access works, this would lead to a very slim per-platform executable.

Side Note: I'm wondering if this per-platform entrypoint will be required regardless, or if WASI can build a per-platform executable if we provide it with main.rs. For example, would rusts std::env::args_os() retrieve the correct CLI arguments? Or would the first argument be the node executable (resulting in failure)?


I think you're on the right track with having a common CLI parser/handler, and then calling that from Rust and JS with the argument list from the command line. My intuition is that this will end up with the least platform-specific code, which I'm inclined to think is going to be the best long-term approach. - @mpalmer

Yeah, this was definitely one of the goals with this work. My main concern was achieving 1:1 functionality with the native binary, but less platform-dependent boilerplate is absolutely the way to go.


The main limitation I see is that it may be difficult to bend some crates we (do|will in future) depend on to our will. Where this is possible, it may require some poking about to figure out how to use these crates in a way which works for WASM (like you've done with clap). - @bcheidemann

I'm hopeful that WASI (or similar) will help with the file I/O conundrum, but absent that, a per-platform read_file function isn't a showstopper. Hopefully there's a WASM-friendly "fetch this URL" crate that you can use for the action checking, though, otherwise there's likely to end up being a whole bunch of per-platform callbacks by the time you're done with that... - @mpalmer

This is not ideal; I was deep in the clap code for a while before I finally understood how all of the gears spun. If we needed to do this for every new feature which required some system access, this would be a massive pain and unmaintainable. Imagine we're 10 platforms down the line, and then we need to find an approach that works for all 10 platforms 🫠

mpalmer commented 1 year ago

Imagine we're 10 platforms down the line, and then we need to find an approach that works for all 10 platforms

:scream_cat:

On the other hand, I can't think of many other platforms that are as weird and constrained as WASM, that Rust supports or is likely to support. The only other contender off the top of my head is nostd/noalloc embedded systems, and (a) they're unlikely to want/need a GitHub Actions validation CLI (although never say never), but also (b) so few crates support those constraints that I suspect people who code for them are pretty much resigned to having to hand-code everything themselves anyway.

bcheidemann commented 1 year ago

Side Note: I'm wondering if this per-platform entrypoint will be required regardless, or if WASI can build a per-platform executable if we provide it with main.rs. For example, would rusts std::env::args_os() retrieve the correct CLI arguments? Or would the first argument be the node executable (resulting in failure)?

I was able to compile main.rs using the wasm32-wasi target and execute that from Node without the need for a platform specific entrypoint. But I hit some issues with WASI around exposing an API to Node and with the glob validation. I haven't given up on WASI yet since it has a lot of upside, but I'm starting to wonder if it maybe better to use the Node-API to expose functionality to Node instead, since this is much more mature. In this case, we would still need this entrypoint.

award28 commented 1 year ago

On the other hand, I can't think of many other platforms that are as weird and constrained as WASM

@mpalmer ah I may be misusing terminology - I meant more similar to something like node. I wasn't sure if we would want to make this installable through something like deno, python, etc.

award28 commented 1 year ago

I was able to compile main.rs using the wasm32-wasi target and execute that from Node without the need for a platform specific entrypoint.

@bcheidemann This is huge! Super excited that this is even possible. I started doing a bit of research into WASI after reading the proposal issue you raised and I'm not sure that this will be much of a problem once WASI hits stage 3. In the current stage, WASI is still working on the system interfaces for the different points of system access (network I/O, disk I/O, etc.). From what I understand, WASI will have these system access APIs available in stage 3. I'm not sure when stage 3 will happen, but we could always use this approach for the time being and migrate to WASI when the time is right.

bcheidemann commented 1 year ago

I'm not sure when stage 3 will happen, but we could always use this approach for the time being and migrate to WASI when the time is right.

Yeah that's true! I may look into the relative effort of migrating what we have to N-API vs "fixing" glob validation for WASM. I think it should be a relatively minor change to the Rust code and might save us some headaches. If it's a comparable amount of effort, maybe N-API would be a suitable stop-gap while we wait for WASI to mature. What do you think?

award28 commented 1 year ago

I think that makes sense! I was originally hoping that we could use wasm_bindgen for something like pip as well to make action validator more accessible, but I misunderstood that this is specific for JavaScript.

If we did want to make action-validator pip installable, we could use https://github.com/PyO3/pyo3. I'm not sure if we would need to abstract the entry point and update the output (I'll look into this if we're interested). What do y'all think?

bcheidemann commented 1 year ago

I think that makes sense! I was originally hoping that we could use wasm_bindgen for something like pip as well to make action validator more accessible, but I misunderstood that this is specific for JavaScript.

If we did want to make action-validator pip installable, we could use https://github.com/PyO3/pyo3. I'm not sure if we would need to abstract the entry point and update the output (I'll look into this if we're interested). What do y'all think?

@award28 that looks interesting! If I understand correctly pyo3 would support the standard library and therefore most crates action-validator would rely on? I guess if this is a case it would be a fiarly thin abstraction over the native rust implementation 🤔

mpalmer commented 1 year ago

Packaging action-validator-the-CLI as a Python, Ruby, etc package doesn't make a lot of sense, as either it's shipped as a source package, in which case you'd need a Rust compiler anyway (and if you've got a Rust compiler, you can just build the pure-Rust CLI), or it's shipped as a binary package, which requires more work on our part (to build the binary packages for all the various packaging formats) and is no better from a trustworthiness perspective than just telling people to download the straight Rust binary from the Releases page.

The only situation in which it would make sense to produce bindings for other languages (such as via Pyo3, Rutie, etc) is if people want to call into the action-validator library from their own code written in Python, Ruby, etc. As you say, @bcheidemann, that library would have at its disposal the full Rust stdlib, including file handling, networking, and all the rest of the things we might take for granted, and as such would not be an impediment to our use of whatever crates we feel are useful.

award28 commented 1 year ago

@bcheidemann

if this is a case it would be a fiarly thin abstraction over the native rust implementation

That's the way I understood it as well! While it is interesting, I think @mpalmer brings up a really good reason why this shouldn't be done.

...and is no better from a trustworthiness perspective than just telling people to download the straight Rust binary from the Releases page.

That's a great point! I agree, I don't see any reason why it would be necessary from this perspective.

The only situation in which it would make sense to produce bindings for other languages (such as via Pyo3, Rutie, etc) is if people want to call into the action-validator library from their own code

Interesting - is this something people have done in the past with action-validator? I agree, this would be the only situation where it makes sense to produce bindings.

bcheidemann commented 1 year ago

The only situation in which it would make sense to produce bindings for other languages (such as via Pyo3, Rutie, etc) is if people want to call into the action-validator library from their own code

Interesting - is this something people have done in the past with action-validator? I agree, this would be the only situation where it makes sense to produce bindings.

Yes, this use case was my initial motivation for adding WASM/NPM support 🙂

We have a CLI tool at work, which does many things and is installable via an internal NPM registry. It's raison d'être is to ensure consistency between projects and it handles stuff like formatting, linting, running tests etc. I wanted to add support for validating GitHub actions to it but since the CLI tool is distributed as an NPM package and because, it's preferable to also install something like action-validator via NPM. Technically, we could also use @action-validator/cli but it is more convenient to use the API directly for our usecase.

award28 commented 1 year ago

This makes perfect sense, and since you had the core package already created, there was no reason not to create the cli package.

With that in mind, I would put my support behind N-API until WASI stabilizes (assuming it provides 1:1 mapping with the native binary).

award28 commented 1 year ago

@bcheidemann Since N-API provides 1:1 mapping, should we close this issue?

bcheidemann commented 1 year ago

@award28 I guess N-API would still require the entrypoint function you discussed above to handle the arguments for multiple files (and everything else) correctly so I think we should keep it open for now until that's implemented. What do you think?

award28 commented 1 year ago

Ah I understand - if that's the case then what benefits do we gain for switching from wasm_bindgen to N-API? Is it the Glob validation?

bcheidemann commented 1 year ago

Ah I understand - if that's the case then what benefits do we gain for switching from wasm_bindgen to N-API? Is it the Glob validation?

At present, it's the glob validation. But the future benefit would be that most new features we add would automatically be available in the NPM package without additional work. Currently that wouldn't be the case for anything relying on system access.