Refactor the commands parsing

XVilka commented 6 years ago

Rather than allow each command handler to parse itself, there is a need for a proper common parser of the command line, providing just a set of tokens to the handler function. This will allow to improve the syntax parser, improve commands help and syntax error reporting, easier and niftier autocompletion, may be even syntax highlight of the command line, right when you type. And many more things.

XVilka commented 6 years ago

Note, there are not only space separated arguments. There are also macroses, aliases, operators like ~, |, > or >>, etc. And the need to quote the whole string like:

[0x0000000]> "td struct qwe { int a; char* b; };"

will go to the null device. No more confusion from a newbie users!

MaskRay commented 6 years ago

This is the way to go! This must be done piecemeal and it may be most desirable to have a nice command completion first (zsh-like).

% r2 -a 6502
 -- arch --
6502         -- 6502/NES/C64/Tamagotchi/T-1000 CPU
8051         -- 8051 Intel CPU
arc          -- Argonaut RISC Core
arm          -- Capstone ARM disassembler
arm.as       -- as ARM Assembler (use ARM_AS environment)
arm.gnu      -- Acorn RISC Machine CPU
arm.ks       -- ARM keystone assembler
arm.winedbg  -- WineDBG's ARM disassembler
...

I'd like to do some of the heavy lifting (or most? I need to feel familiar with most commands before next Thursday...) to move const char *help_msg*[] arrays to toplevel.

radare commented 6 years ago

Wasnt there already an issue for that? This wasnt possible to be done before because the syntax has changed and right now its pretty stable and consistent. But enforcing this with manual checks is not the way to go obviously.

On 20 Jul 2017, at 07:49, Fangrui Song notifications@github.com wrote:

This is the way to go! This must be done piecemeal and it may be most desirable to have a nice command completion first (zsh-like).

% r2 -a 6502 -- arch -- 6502 -- 6502/NES/C64/Tamagotchi/T-1000 CPU 8051 -- 8051 Intel CPU arc -- Argonaut RISC Core arm -- Capstone ARM disassembler arm.as -- as ARM Assembler (use ARM_AS environment) arm.gnu -- Acorn RISC Machine CPU arm.ks -- ARM keystone assembler arm.winedbg -- WineDBG's ARM disassembler ... I'd like to do some of the heavy lifting (or most? I need to feel familiar with most commands before next Thursday...) to move const char help_msg[] arrays to toplevel.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

radare commented 6 years ago

So i agree on that, will also speedup rcorecmd()

On 20 Jul 2017, at 07:49, Fangrui Song notifications@github.com wrote:

This is the way to go! This must be done piecemeal and it may be most desirable to have a nice command completion first (zsh-like).

% r2 -a 6502 -- arch -- 6502 -- 6502/NES/C64/Tamagotchi/T-1000 CPU 8051 -- 8051 Intel CPU arc -- Argonaut RISC Core arm -- Capstone ARM disassembler arm.as -- as ARM Assembler (use ARM_AS environment) arm.gnu -- Acorn RISC Machine CPU arm.ks -- ARM keystone assembler arm.winedbg -- WineDBG's ARM disassembler ... I'd like to do some of the heavy lifting (or most? I need to feel familiar with most commands before next Thursday...) to move const char help_msg[] arrays to toplevel.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

radare commented 6 years ago

Why thursday

On 20 Jul 2017, at 07:49, Fangrui Song notifications@github.com wrote:

This is the way to go! This must be done piecemeal and it may be most desirable to have a nice command completion first (zsh-like).

% r2 -a 6502 -- arch -- 6502 -- 6502/NES/C64/Tamagotchi/T-1000 CPU 8051 -- 8051 Intel CPU arc -- Argonaut RISC Core arm -- Capstone ARM disassembler arm.as -- as ARM Assembler (use ARM_AS environment) arm.gnu -- Acorn RISC Machine CPU arm.ks -- ARM keystone assembler arm.winedbg -- WineDBG's ARM disassembler ... I'd like to do some of the heavy lifting (or most? I need to feel familiar with most commands before next Thursday...) to move const char help_msg[] arrays to toplevel.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

MaskRay commented 6 years ago

Cool! Glad you agree with moving help_msg* arrays to toplevel.

@radare DEFCON ... And I should learn enough to use r2 in the CTF and evangelize it to my teammates...

karliss commented 6 years ago

I am not sure if moving help messages to top level without considering the way commands will be structured helps too much. Help messages could work by looking up a command, printing associated help message and iterating through direct sub-commands and printing their short messages. Such approach would ensure that there is no command which is not listed in help and allow automatic description generation for builtin aliases like Sj=iSj.

Here an approximate idea of what command description could contain:

command name
parent/child information for structuring commands tree by their functionality, relying only on prefixes doesn't work as rm and r2 shouldn't be considered part of resize commands
argument description with argument types for autocomplete
action
action type
argument offset - for commands that receive all arguments as string (current functions) to indicate if command name and space after it should be included in argument value
help message (internal or external) for this command but not subcommands

Having multiple action types would permit leaving whole command subtrees using the old command description and converting them incrementally.

Negative argument offset would mean that command function receives command name as part of argument this would provide better compatibility with existing command functions during transition. Positive offset would allow automatically skipping space between command name and argument thus simplifying code in command itself.

I remember seeing some commands that changed behavior depending on type and number of arguments, those might be tricky to deal with.

MaskRay commented 6 years ago

@karliss Thanks for your enlightening draft of command description! Do you know how to make help message available for lookup in a centralized registry? I go the toplevel way because __attribute__((constructor)) magic can be used to register the help message.

Maijin commented 6 years ago

@MaskRay if you're going to defcon, make sure you ping there 👍

radare commented 6 years ago

DoNt use constructors for this please :p

On 20 Jul 2017, at 09:28, Fangrui Song notifications@github.com wrote:

@karliss Thanks for your enlightening draft of command description! Do you know how to make help message available for lookup in a centralized registry? I go the toplevel way because attribute((constructor)) magic can be used to register the help message.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

wargio commented 6 years ago

i would love to do this refactoring.

radare commented 6 years ago

my idea for this is to implement a function that parses the command line and fills a struct with all the info required to run everything:

modifiers
modiifer parameters
grep
prefixes
suffixes
comments
other commands
main command
arguments
...

On 20 Jul 2017, at 10:12, Giovanni notifications@github.com wrote:

i would love to do this refactoring.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radare/radare2/issues/7967#issuecomment-316630334, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lhCKUXfMEUAZ3_0-Z47BOyVkXBS4ks5sPwvQgaJpZM4Odl9L.

wargio commented 6 years ago

something like gnu opt

radare commented 6 years ago

getopt?

On 20 Jul 2017, at 14:22, Giovanni notifications@github.com wrote:

something like gnu opt

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radare/radare2/issues/7967#issuecomment-316686448, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lgSpi8j2978FkwFY0xUTlKo-pfwhks5sP0ZtgaJpZM4Odl9L.

XVilka commented 6 years ago

Since there is mpc in the radare2 sourcecode anyway (libr/egg/rlcc/mpc), may be it makes sense to use for parsing command line as well?

wargio commented 6 years ago

yes, getopt. for mpc, i've never heard of it.

radare commented 6 years ago

getopt parses command arguments, not syntactic stuff, the mpc thing was suposed to parse rlang or C, not sure if that will work for the kind of syntax we have in r2 commands.

also, this refactoring should be done in parallel as an option at runtime, to choose between different prompts otherwise we will have a lot of regressions

On 20 Jul 2017, at 14:41, Giovanni notifications@github.com wrote:

yes, getopt. for mpc, i've never heard of it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radare/radare2/issues/7967#issuecomment-316690748, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lkVq0BpIIAaH4GLCF5AfFhh-eBD3ks5sP0sOgaJpZM4Odl9L.

MaskRay commented 6 years ago

radare said in the channel that other issues are of higher priority. This refactoring should be implemented in a separate loop that does not break current code.

What about having a boolean exp.cmd variable to enable the refactored command parser? Users can invoke r2 -e exp.cmd=1 to enable the feature.

Moving help messages (https://github.com/radare/radare2/pull/7969 ) can serve as a first step to the refactoring and the effect will be instant: a friendlier command completion system with help messages from const char *help_msg[] arrays scattered all over the code. The case '?' branches in the code will refer to const char *help_msg_ae[].

You may have concerns about having help messages not around the code, but can you shed light on an alternative to have a centralized help message?

radare commented 6 years ago

i would say cmd.parser or cfg.newparser or something like that

there’s no exp. prefix yet and i understand its experimental, but everytihng in r2land is experimental :P so there’s no need for that namespace imho

On 21 Jul 2017, at 03:21, Fangrui Song notifications@github.com wrote:

radare said in the channel that other issues are of higher priority. This refactoring should be implemented in a separate loop that does not break current code.

What about having a boolean exp.cmd variable to enable the refactored command parser? Users can invoke r2 -e exp.cmd=1 to enable the feature.

Moving help messages (#7969 https://github.com/radare/radare2/pull/7969 ) can serve as a first step to the refactoring and the effect will be instant: a friendlier command completion system with help messages from const char *help_msg[] arrays scattered all over the code.

You may have concerns about having help messages not around the code

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radare/radare2/issues/7967#issuecomment-316874957, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lhGVbiifwHz-NxROjrT5B-tzx15uks5sP_0fgaJpZM4Odl9L.

MaskRay commented 6 years ago

Before:

    case 'u': // "?u"
        {
            char unit[32];
            n = r_num_math (core->num, input+1);
            r_num_units (unit, n);
            r_cons_println (unit);
        }
        break;

After:

#include "r_args.h"
// #define ARG_STR(i) ctx.args[i].str_val
// #define ARG_INT(i) ctx.args[i].str_val
// #define ARG_ADDR(i) ctx.args[i].addr_val
// #define ARG_STR_DEF(i, def) (i < ctx.nargs ? ctx.args[i].str_val : def)
// #define ARG_INT_DEF(i, def) (i < ctx.nargs ? ctx.args[i].int_val : def)
// ...

int cmd_help(...) {
  struct Context {
    int nargs;
    union Arg {const char* str_val; int int_val; ut64 addr_val;} args[9];
  } ctx;
  switch (input[0]) {
    ...
  case 'u': { // "?u"
    R_ARGS("s|iA");  // expands to if (parse_fail) {print_error; break;} else fill in ctx.args
    // 0th arg as const char*, 1st arg as int, 2nd arg as ut64
    r_cons_printf (ARG_STR(0), ARG_INT(1), ARG_ADDR(2));
    break;
  }
}

MaskRay commented 6 years ago

The final goal is to move R_ARGS to the command descriptor function so that arguments can be completed. But before that, these R_ARGS will be kept under case labels.

xarkes commented 6 years ago

If I understood your example correctly, you changed the syntax for the help command from u? to ?u, is that right? From what I understand your code wouldn't implement any sort of AST?

radare commented 6 years ago

Wat

On 6 Mar 2018, at 10:11, xarkes notifications@github.com wrote:

If I understood your example correctly, you changed the syntax for the help command from u? to ?u, is that right? From what I understand your code wouldn't implement any sort of AST?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

MaskRay commented 6 years ago

If I understood your example correctly, you changed the syntax for the help command from u? to ?u, is that right?

No. I just gave an example that argument parsing should be done properly, not strchr strstr r_num_math here and there.

xarkes commented 6 years ago

Yep ok good.

On Sun, Mar 11, 2018, 21:03 Fangrui Song notifications@github.com wrote:

If I understood your example correctly, you changed the syntax for the help command from u? to ?u, is that right?

No. I just gave an example that argument parsing should be done properly, not strchr strstr r_num_math here and there.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/radare/radare2/issues/7967#issuecomment-372144478, or mute the thread https://github.com/notifications/unsubscribe-auth/AKRgfdHP-TzcEl5cMp5gNVDHqgV66sVoks5tdYMTgaJpZM4Odl9L .

xarkes commented 6 years ago

Note that it would be perfect if each command has a public API. On Cutter we mostly use r_core_cmd and R_API functions. It appears that the R_API are less reliable than the commands themselves (which sucks).

MaskRay commented 5 years ago

There is now an autocomplete feature https://github.com/radare/radare2/blob/master/libr/core/core.c#L1923

static void init_autocomplete (RCore* core) {
    core->autocomplete = R_NEW0 (RCoreAutocomplete);
    /* flags */
    r_core_autocomplete_add (core->autocomplete, "s", R_CORE_AUTOCMPLT_FLAG, true);
    r_core_autocomplete_add (core->autocomplete, "s+", R_CORE_AUTOCMPLT_FLAG, true);
    r_core_autocomplete_add (core->autocomplete, "b", R_CORE_AUTOCMPLT_FLAG, true);
    r_core_autocomplete_add (core->autocomplete, "f", R_CORE_AUTOCMPLT_FLAG, true);

The registration should be moved to cmd_*.c files
The registration should be unified with command metadata (information of arguments).

@wargio

XVilka commented 4 years ago

We can start with defining grammar of radare2 commands with MPC: https://github.com/orangeduck/mpc#language-approach @thestr4ng3r already did a good job with porting of some of the C parsing on top of MPC, so it can be used as a starting point: https://github.com/radare/radare2/blob/master/libr/parse/ctype.c

After this is done we can improve

macro syntax - switch to the small LISP-ish syntax (so would allow as complex as user wants, in theory)
alias syntax - simplify
improve variables syntax, and support named arguments in macroses
better loops and conditionals support
allow multiple nested backticks
support functional syntax

XVilka commented 4 years ago

I will offer 0.1 BTC to the one who will fix this issue. Initially, I wanted to use BountySource, but its support for bitcoins is a plain nightmare. So I will offer it directly.

XVilka commented 4 years ago

Bounty is up to 0.2 BTC now

XVilka commented 4 years ago

Scrap the MPC idea, Tree-Sitter is a better approach, would allow parsing as typed

https://tree-sitter.github.io/tree-sitter/implementation

MaskRay commented 4 years ago

neovim is integrating tree-sitter, but I think it may be an overkill for radare2 to integrate it. A simple top-down operator precedence parsing is probably all that is needed. The complicated thing is of course the unclear semantics and expectation of the current syntax. To keep it fully backward compatible is probably unrealistic (the long tail).

thestr4ng3r commented 4 years ago

We are also considering to use tree-sitter for (partially) parsing C (headers, types), so having it in r2 can make sense.

Agreed about backwards compatibility. It should be compatible where it makes sense, but it doesn't have to be in every way. For example I don't like the "?E clippy@rada.re" syntax. I would prefer ?E "clippy@rada.re" like any other shell does.

XVilka commented 4 years ago

Bounty is up, 0.3 BTC now.

About backward compatibility - yes, it should be mostly compatible, with fixes to the tests/other commands where this is unrealistic.

ret2libc commented 4 years ago

I'm going to use this issue to discuss problems I'm encountering while implementing the new parser in tree-sitter. Right now, you can do things like pd 10 +2 or pd 5 @ main +2, which I think is not good because it is not clear how to split arguments.

How do you know pd should be called as pd(12) and not as pd(10,+,2)?

Or, to make the example clearer, let's look at om, which accepts more than one argument om fd vaddr [size] [paddr] [rwx] [name]. If math expressions can have spaces, how do you distinguish arguments? How do you interpret om 3 4 +10 + 20000? Is it om(3, 4, +10, 20000)? Or is it om(3, 14, 20000)? om(3, 20014)? Or something else?

For the reasons above, I think math expressions should either be without spaces or wrapped between () to make it clear it's just one "argument". WDYT?

XVilka commented 4 years ago

I think yes, allowing spaces only in the wrapped math expressions is the way to go.

On Sat, Feb 15, 2020, 00:37 Riccardo Schirone notifications@github.com wrote:

I'm going to use this issue to discuss problems I'm encountering while implementing the new parser in tree-sitter. Right now, you can do things like pd 10 +2 or pd 5 @ main +2, which I think is not good because it is not clear how to split arguments.

How do you know pd should be called as pd(12) and not as pd(10,+,2)?

Or, to make the example clearer, let's look at om, which accepts more than one argument om fd vaddr [size] [paddr] [rwx] [name]. If math expressions can have spaces, how do you distinguish arguments? How do you interpret om 3 4 +10 + 20000? Is it om(3, 4, +10, 20000)? Or is it om(3, 14, 20000)? om(3, 20014)? Or something else?

For the reasons above, I think math expressions should either be without spaces or wrapped between () to make it clear it's just one "argument". WDYT?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/radareorg/radare2/issues/7967?email_source=notifications&email_token=AABRT7JMNSTDY7E5Z6N2Z4DRC3CFLA5CNFSM4DTWL5F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELZT7TQ#issuecomment-586366926, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABRT7LIQYBW4OGGJ5SABJDRC3CFLANCNFSM4DTWL5FQ .

ret2libc commented 4 years ago

Update: basic command structure can now be parsed with the new tree-sitter parser, enabled with cfg.newshell.

XVilka commented 3 years ago

One more example on supporting highlighting with tree-sitter (for OCaml language): https://github.com/tree-sitter/tree-sitter-ocaml/pull/42 cc @ret2libc

trufae commented 3 years ago

We can close this issue. For the curious ones:

old parser is being cleaned up and fixed by @Alucowie
the newshell will be removed in the upcoming 5.3
i have an isolated clean r2 shell implementation in pure C, but this will be introduced in the future

radareorg / radare2

Refactor the commands parsing #7967