qri-io / qri

you're invited to a data party!
https://qri.io
GNU General Public License v3.0
1.1k stars 66 forks source link

special characters and their usage in qri #1354

Open dustmop opened 4 years ago

dustmop commented 4 years ago

A topic that frequently comes up is how we should allocate new special characters to expand the syntax of qri commands. This includes both supporting new features like initIDs, and also adding convenience shortcuts to shrink otherwise verbose commands. We should plan ahead here, in order avoid running out of characters or needing to reuse a special character in multiple contexts. It would be ideal if each character could have a single meaning or pronounciation in the qri ecosystem. For example, we use "@" to mean not just "at" but "at a specific machine-readable path to a content-aware system" (more or less).

For en-US keyboards, these are the available special characters: ~ ` ! @ # $ % ^ & * ( ) - _ = + [ ] { } ; : ' " , . < > / ?

Something to consider: non-US keyboards may not have these characters as single keys. I'm not an expert on this topic, but it's something we should be mindful of; these characters should hopefully be usable as shortcuts, but not necessarily required to use qri for every day tasks.

Many of these characters are already in use by bash, meaning they aren't suitable for being typed as command on the typical POSIX command-line invocation. In the table below, the test for whether a character is "taken" by bash if whether you can run this command:

echo ab@cde

where @ is the special character in question. If bash outputs "ab@cde" then the character is not taken.

+----+--------+-------+---------------------------------------------------------+
|char|status  |owner  |notes                                                    |
+----+--------+-------+---------------------------------------------------------+
| ~  | OK     | -     | when alone, the user's home directory                   |
+----+--------+-------+---------------------------------------------------------+
| `  | taken  | bash  | run a command, replace expression with the result       |
+----+--------+-------+---------------------------------------------------------+
| !  | taken  | bash  | history expansion                                       |
+----+--------+-------+---------------------------------------------------------+
| @  | USING  | qri   | in a dataset reference, separates machine readable path |
+----+--------+-------+---------------------------------------------------------+
| #  | OK     | -     | used for comments, but okay within a larger string      |
+----+--------+-------+---------------------------------------------------------+
| $  | taken  | bash  | variable expansion                                      |
+----+--------+-------+---------------------------------------------------------+
| %  | OK     | -     | used by windows' cmd, but we should ignore that         |
+----+--------+-------+---------------------------------------------------------+
| ^  | OK     | -     |                                                         |
+----+--------+-------+---------------------------------------------------------+
| &  | taken  | bash  | run a command in the background                         |
+----+--------+-------+---------------------------------------------------------+
| *  | OK     | -     |                                                         |
+----+--------+-------+---------------------------------------------------------+
| () | taken  | all   | used everywhere for nesting (regex, bash expressions)   |
+----+--------+-------+---------------------------------------------------------+
| -_ | USING  | qri   | can appear in dataset names                             |
+----+--------+-------+---------------------------------------------------------+
| =  | OK     | -     |                                                         |
+----+--------+-------+---------------------------------------------------------+
| +  | OK     | -     |                                                         |
+----+--------+-------+---------------------------------------------------------+
| [] | OK     | -     |                                                         |
+----+--------+-------+---------------------------------------------------------+
| {} | OK     | -     |                                                         |
+----+--------+-------+---------------------------------------------------------+
| ;  | taken  | bash  | separates commands                                      |
+----+--------+-------+---------------------------------------------------------+
| :  | OK     | -     |                                                         |
+----+--------+-------+---------------------------------------------------------+
| '" | taken  | text  | used everywhere to represent strings                    |
+----+--------+-------+---------------------------------------------------------+
| ,  | OK     | -     |                                                         |
+----+--------+-------+---------------------------------------------------------+
| .  | OK     | -     |                                                         |
+----+--------+-------+---------------------------------------------------------+
| <> | taken  | bash  | redirecting stdin, stdout, and stderr                   |
+----+--------+-------+---------------------------------------------------------+
| /  | USING  | qri   | path separator                                          |
+----+--------+-------+---------------------------------------------------------+
| ?  | OK     | -     |                                                         |
+----+--------+-------+---------------------------------------------------------+

So that leaves these characters that can be used by qri:

~ # % ^ * = + [] {} : , . ?

Suggestions

Some initial thoughts I have:

eq

Use = to represent the initID in a resolved dataset reference. For example:

qri get dustmop/my_dataset=hlrkcslkt6q37sgc356x4oy4farbcwz35tgprvhzphptjsblgkpa@/ipfs/QmHash

Or a bare initID could appear like this:

qri get =hlrkcslkt6q37sgc356x4oy4farbcwz35tgprvhzphptjsblgkpa

hash

Use # to disambiguate usernames. In the p2p context, or fully local context, it's possible to create the same username on multiple peers, but (unless private keys are unwisely copied around) they should have different profileIDs. We can encode profileID prefixs into decimal numbers to disambiguate usernames. Discord does something like this.

MachineA:
dustmop -> dustmop#34197

MachineB:
dustmop -> dustmop#87250

up arrow

Use ^ to represent versions before head. This matches git. I like it because "going up" can mentally map to "going towards ancestors".

qri get dustmop/my_dataset^^^

comma

Use , to represent datasets observed in a list operation. I like comma because it's often associated with "items in a list". The goal here is that I can run qri list:

,1  dustmop/my_first_dataset
    /ipfs/QmHash

,2  dustmp/my_second_dataset
    /ipfs/QmAnother

Then I can use:

qri get ,1

Composition

Of course we should make sure these syntaxes can be composed together:

qri get ,3^^^
qri get =hlrkcslkt6q37sgc356x4oy4farbcwz35tgprvhzphptjsblgkpa^^
qri get dustmop#87250/that_dataset@/ipfs/QmVersionWanted
dustmop commented 4 years ago

Related issue: https://github.com/qri-io/qri/issues/868

feep commented 4 years ago

? is a single character glob. Similar to \S in grep.

❯ ls ?e*
.rw-r--r--  338 rusty 05-28 13:50 meta.json
.rw-r--r-- 1.0k rusty 05-28 13:50 readme.md

Maybe I’m missing something and globs are safe?

Also used for globbing: [, ], *.

bash-3.2$ ls [bm]*
body.csv    meta.json