jqnatividad / qsv

Blazing-fast Data-Wrangling toolkit
https://qsv.dathere.com
The Unlicense
2.47k stars 70 forks source link

`--help json` for all commands to get full usage text as JSON #1307

Open rzmk opened 1 year ago

rzmk commented 1 year ago

Is your feature request related to a problem? Please describe.

With fast updates to qsv being made, to keep track of them for potential further usage for docs, the site, and other applications. It would be helpful to have a kind of --help json (or some other name or just a command like qsv metadata) that describes the full command as JSON instead of plain text such as qsv apply --help json, and potentially one for qsv itself where all commands and their details are listed with one qsv --help json. This can later be useful for auto-updated docs for example.

Describe the solution you'd like

Here are examples of potential commands, though there is further detail below on changes to this.

Perhaps we can also make it more modular where users can choose to get only the arguments or only the subcommands, options, etc. or select certain commands. Though this can be extracted from the broader JSON data too.

This should be reusable regardless of the release/commit the user is using for qsv, potentially by parsing the usage text and outputting it as JSON and also potentially to a file. This way all the data can stay updated even with new usage text updates and features.

A (potentially inaccurate/incomplete) sample structure for all commands including apply may be:

{
    "apply": {
        "description": "",
        "short_description": "",
        "link": "https://github.com/jqnatividad/qsv/blob/master/src/cmd/apply.rs",
        "demoLink": "",
        "subcommands": {
            "operations": {
                "description": "",
                "args": {
                    "<input>": {
                        "label": "Input 1",
                        "type": "file",
                        "allowedFileTypes": [".csv"],
                        "required": true,
                    },
                    ...
                },
                "options": {
                    "--quote": {
                        "label": "Quote",
                         "aliases": ["--quote", "-q"],
                         "description": "The quote character to use.",
                         "type": "text",
                    },
                    ...
                },
            },
            ...
        },
        "args": {},
        "options": {}
    },
    ...
}

Note: -q doesn't exist but is shown as an example. We also assume args are unique so that the dictionary works, though this can change based on the implementation. There can be another type called "bool" for any options that don't have an input (like --human-readable for count vs. having --rename <name> where <name> must be provided). Also there must be a way to consider the options that use = instead of a space between the option and the input the user provides.

Link and demo link may not be in the usage text currently but can be useful again for docs and other use cases (e.g., the site). Other potential restrictions can also be integrated with this such as if a user selects an option then which ones are therefore disabled, what file types are allowed for an arg or option, etc. Also if the qsv variant/version doesn't have certain commands available such as sqlp, then we may want to also allow choosing between providing output only for the available commands and for all commands regardless of availability (which can also have an attribute such as "available": true/false).

This JSON output should be auto-updated based on the usage text for each command, or at least the base JSON and then other metadata can be added on top of that base data.

This may also be a sample reference for the JSON: https://tauri.app/v1/guides/features/cli

jqnatividad commented 1 year ago

Good idea @rzmk !

This should also help us repurpose the usage text to proper documentation in various formats. Right now, folks are basically dumped into the source code, which is fine for technical users (one may argue, actually better as they can see how the CSV is processed), but too intimidating for non-developers.

This will have to be done in qsv-docopt (https://github.com/jqnatividad/docopt.rs)

jqnatividad commented 1 year ago

Hi @rzmk , I started work on this on the qsv-docopt crate so that all commands will have the ability to generate a JSON file like you described above.

I'm still working out the structure of the JSON file, but the structure you propose above makes sense. For the first iteration though, I may just take advantage of serde_json to serialize the docopt structs to JSON to make it simple.

In the meantime, just create a dummy JSON file so you're not blocked. I'll prioritize stabilizing the JSON schema first before writing the code so we can work in parallel.

github-actions[bot] commented 9 months ago

Stale issue message

rzmk commented 6 months ago

Might be useful as a reference when implementing this (just a proof of concept, doesn't generate the full JSON such as considering subcommands):

use regex::Regex;
use serde_json::{json, Value};
use std::fs;
use std::io::{self, Write};
use std::path::{Path, PathBuf};

#[derive(Debug)]
struct OptionInfo {
    help: String,
    arg: Option<ArgType>,
    arg_name: Option<String>,
    aliases: Vec<String>,
}

#[derive(Debug)]
enum ArgType {
    Named(String),
}

fn main() -> io::Result<()> {
    // Directory containing Rust files
    let dir_path = "cmd";

    // Iterate through files in the directory
    for entry in fs::read_dir(dir_path)? {
        let entry = entry?;
        let file_path = entry.path();

        // Skip directories
        if !file_path.is_file() {
            continue;
        }

        // Extract USAGE from the file
        if let Some(usage) = extract_usage(&file_path) {
            // Export USAGE to a text file
            export_usage_to_txt(&file_path, &usage)?;
        }
    }

    // Directory containing txt files
    let txt_dir = "usage/txt";

    // Iterate through files in the directory
    for entry in fs::read_dir(txt_dir)? {
        let entry = entry?;
        let file_path = entry.path();

        // Skip non-txt files
        if file_path.extension().unwrap_or_default() != "txt" {
            continue;
        }

        // Read usage text from a file
        let usage_text = match fs::read_to_string(&file_path) {
            Ok(text) => text,
            Err(err) => {
                eprintln!("Error reading {}: {}", file_path.display(), err);
                continue;
            }
        };

        let re = Regex::new(r"(?s)(.*)Usage:\n((?:.*(?:\n|$))+)").unwrap();

        if let Some(caps) = re.captures(&usage_text) {
            let description = caps.get(1).unwrap().as_str().trim();
            let options_str = caps.get(2).unwrap().as_str();
            let options: Vec<(&str, OptionInfo)> = options_str
                .split('\n')
                .filter_map(|line| {
                    let parts: Vec<&str> = line.trim().splitn(2, "  ").collect();
                    if parts.len() == 2 {
                        let (option, help) = (parts[0].trim_end_matches(','), parts[1].trim());
                        let (option_name, aliases_str) = parse_option_name(option);
                        let aliases = parse_aliases(aliases_str);
                        let (arg, arg_name) = match option.contains("<") || option.contains("=") {
                            true => {
                                let arg_name = extract_arg_name(option);
                                (Some(ArgType::Named(arg_name.clone())), Some(arg_name))
                            }
                            false => (None, None),
                        };
                        let option_info = OptionInfo {
                            help: help.to_string(),
                            arg,
                            arg_name,
                            aliases,
                        };
                        Some((option_name, option_info))
                    } else {
                        None
                    }
                })
                .collect();

            let mut json_output = json!({
                "description": description,
                "options": {}
            });
            for (option, info) in options {
                let aliases: Vec<Value> = info.aliases.iter().map(|alias| json!(alias)).collect();
                let obj = json!({
                    "help": info.help,
                    "arg": info.arg.is_some(),
                    "arg_name": info.arg_name.unwrap_or_default(), // Include arg_name
                    "aliases": aliases,
                });
                json_output["options"][option] = obj;
            }

            // Write to json file
            let json_dir = "usage/json";
            let json_file_name =
                format!("{}.json", file_path.file_stem().unwrap().to_string_lossy());
            let json_path = Path::new(json_dir).join(json_file_name);
            let json_str = serde_json::to_string_pretty(&json_output).unwrap();
            match fs::write(&json_path, json_str) {
                Ok(_) => println!("Successfully wrote to {}", json_path.display()),
                Err(err) => eprintln!("Error writing to {}: {}", json_path.display(), err),
            };
        }
    }

    Ok(())
}

// Function to extract USAGE from a Rust file
fn extract_usage(file_path: &Path) -> Option<String> {
    // Read the file content
    let content = match fs::read_to_string(file_path) {
        Ok(content) => content,
        Err(_) => return None,
    };

    // Look for the USAGE string
    let usage_prefix = "static USAGE: &str = r#\"\n";
    let usage_suffix = "\"#;";
    let usage_start = match content.find(usage_prefix) {
        Some(start) => start + usage_prefix.len(),
        None => return None,
    };
    let usage_end = match content[usage_start..].find(usage_suffix) {
        Some(end) => usage_start + end,
        None => return None,
    };

    // Extract USAGE
    let usage = &content[usage_start..usage_end];

    Some(usage.to_owned())
}

// Function to export USAGE to a text file
fn export_usage_to_txt(file_path: &Path, usage: &str) -> io::Result<()> {
    let mut txt_path = PathBuf::from("usage/txt").join(file_path.file_stem().unwrap());
    txt_path.set_extension("txt");

    let mut txt_file = fs::File::create(&txt_path)?;
    txt_file.write_all(usage.as_bytes())?;

    println!("Exported USAGE to: {}", txt_path.display());

    Ok(())
}

fn parse_option_name(option: &str) -> (&str, Option<&str>) {
    let mut parts = option.splitn(2, " ");
    let option_name = parts.next().unwrap().trim_end_matches(',');
    let aliases = parts.next();
    (option_name, aliases)
}

fn extract_arg_name(option: &str) -> String {
    let start_index = option.find('<').unwrap_or(0) + 1;
    let end_index = option.find('>').unwrap_or(option.len());
    option[start_index..end_index].trim().to_string()
}

fn parse_aliases(aliases_str: Option<&str>) -> Vec<String> {
    match aliases_str {
        Some(aliases_str) => aliases_str
            .split(", ")
            .flat_map(|alias| {
                alias
                    .split_whitespace()
                    .filter(|part| !part.contains('<') && !part.contains('>'))
                    .map(|part| part.to_string())
            })
            .collect(),
        None => vec![],
    }
}

For example here's the qsv slice USAGE text:

Returns the rows in the range specified (starting at 0, half-open interval).
The range does not include headers.

If the start of the range isn't specified, then the slice starts from the first
record in the CSV data.

If the end of the range isn't specified, then the slice continues to the last
record in the CSV data.

This operation can be made much faster by creating an index with 'qsv index'
first. Namely, a slice on an index requires parsing just the rows that are
sliced. Without an index, all rows up to the first row in the slice must be
parsed.

Usage:
    qsv slice [options] [<input>]
    qsv slice --help

slice options:
    -s, --start <arg>      The index of the record to slice from.
                           If negative, starts from the last record.
    -e, --end <arg>        The index of the record to slice to.
    -l, --len <arg>        The length of the slice (can be used instead
                           of --end).
    -i, --index <arg>      Slice a single record (shortcut for -s N -l 1).

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -n, --no-headers       When set, the first row will not be interpreted
                           as headers. Otherwise, the first row will always
                           appear in the output as the header row.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)

And here's the generated JSON:

{
  "description": "Returns the rows in the range specified (starting at 0, half-open interval).\nThe range does not include headers.\n\nIf the start of the range isn't specified, then the slice starts from the first\nrecord in the CSV data.\n\nIf the end of the range isn't specified, then the slice continues to the last\nrecord in the CSV data.\n\nThis operation can be made much faster by creating an index with 'qsv index'\nfirst. Namely, a slice on an index requires parsing just the rows that are\nsliced. Without an index, all rows up to the first row in the slice must be\nparsed.",
  "options": {
    "-d": {
      "aliases": [
        "--delimiter"
      ],
      "arg": true,
      "arg_name": "arg",
      "help": "The field delimiter for reading CSV data."
    },
    "-e": {
      "aliases": [
        "--end"
      ],
      "arg": true,
      "arg_name": "arg",
      "help": "The index of the record to slice to."
    },
    "-h": {
      "aliases": [
        "--help"
      ],
      "arg": false,
      "arg_name": "",
      "help": "Display this message"
    },
    "-i": {
      "aliases": [
        "--index"
      ],
      "arg": true,
      "arg_name": "arg",
      "help": "Slice a single record (shortcut for -s N -l 1)."
    },
    "-l": {
      "aliases": [
        "--len"
      ],
      "arg": true,
      "arg_name": "arg",
      "help": "The length of the slice (can be used instead"
    },
    "-n": {
      "aliases": [
        "--no-headers"
      ],
      "arg": false,
      "arg_name": "",
      "help": "When set, the first row will not be interpreted"
    },
    "-o": {
      "aliases": [
        "--output"
      ],
      "arg": true,
      "arg_name": "file",
      "help": "Write output to <file> instead of stdout."
    },
    "-s": {
      "aliases": [
        "--start"
      ],
      "arg": true,
      "arg_name": "arg",
      "help": "The index of the record to slice from."
    }
  }
}
jqnatividad commented 4 months ago

IMHO, the best place to do this universally is through qsv-docopt - our docopt command-line parser fork.

Already, qsv-docopt parses the Usage text and figures out how to map it to the args struct, so it should be a cleaner implementation.