kagisearch / bangs

Repository of bangs used by Kagi Search
https://kagi.com
MIT License
166 stars 57 forks source link

Improvements to bang.json format/structure. #120

Closed Leftium closed 1 week ago

Leftium commented 1 week ago

Introduction/motivation

After realizing bangs are just bookmarks, I started working on a bang-based launcher/bookmark/notes app. I got some ideas about how to improve the way bangs are stored and used, which may benefit the bang.json file.

In my app, any URL bookmark may have optional bang triggers. Bookmarks are stored as plain text: some metadata at the top followed by the user's notes for that URL. So I want the bookmark (bang) metadata to be as easy to read/edit as possible.

I will be using this format for my app; if there is interest I can share the (WSV) conversion tools + output.

Proposal A

Store bangs in whitespace separated format; simplified version of WhitespaceSV

whitespace separated version:

# Note this is modified WhitespaceSV; normally strings with spaces need to be quoted.
# However this format can assume the format of a line is always:
# Field name, whitespace, then string until newline/comment.

s  Google Translate
d  translate.google.com
t  gt
u  https://translate.google.com/#auto/en/{{{s}}}
c  Online Services
sc Google

JSON version:

{
    "s": "Google Translate",
    "d": "translate.google.com",
    "t": "gt",
    "u": "https://translate.google.com/#auto/en/{{{s}}}",
    "c": "Online Services",
    "sc": "Google"
}

Proposal B

Combine entries with same (normalized) URL.

How bangs are combined:

  1. Summary (website name) of highest ranked entry used (DuckDuckGo bang ranking).
  2. Triggers ordered by: shortest, longest, then remaining by rank.
  3. Categories and subcategories merged into Labels (nested, like Gmail)
  4. Alternate summaries become Labels
  5. Domain automatically inferred from URL. (Unless explicitly specified.)

WSV version: (6 lines)

# This contains all the important information from the JSON version below.
# Some unimportant information was lost (like which trigger was associated with which name).

U https://translate.google.com/#auto/en/{q}
S "Detect language to English"
T !gt !gtranslate !tr !translate !gten !gtenglish !gt-english
L Translation Google                   # c=Translation, sc=Google
L "Online Services" Google             # Alternate label
L Alternate-name "Google Translate"    # Alternate names become labels

# Domain d=translate.google.com automatically inferred;  may be explicitly specified.

JSON version: (56 lines)

{
    "s": "Google Translate",
    "d": "translate.google.com",
    "t": "gt",
    "u": "https://translate.google.com/#auto/en/{{{s}}}",
    "c": "Online Services",
    "sc": "Google"
},
{
    "s": "Google Translate",
    "d": "translate.google.com",
    "t": "gtranslate",
    "u": "https://translate.google.com/#auto/en/{{{s}}}",
    "c": "Translation",
    "sc": "Google"
},
{
    "s": "Google Translate",
    "d": "translate.google.com",
    "t": "tr",
    "u": "https://translate.google.com/?text={{{s}}}",
    "c": "Translation",
    "sc": "Google"
  },
  {
    "s": "Google Translate",
    "d": "translate.google.com",
    "t": "translate",
    "u": "https://translate.google.com/#auto/en/{{{s}}}",
    "c": "Translation",
    "sc": "Google"
  },
 {
    "s": "Detect language to English",
    "d": "translate.google.com",
    "t": "gten",
    "u": "https://translate.google.com/#auto/en/{{{s}}}",
    "c": "Translation",
    "sc": "Google"
},
{
    "s": "Detect language to English",
    "d": "translate.google.com",
    "t": "gtenglish",
    "u": "https://translate.google.com/#auto/en/{{{s}}}",
    "c": "Translation",
    "sc": "Google"
},
{
    "s": "Detect language to English",
    "d": "translate.google.com",
    "t": "gt-english",
    "u": "https://translate.google.com/#auto/en/{{{s}}}",
    "c": "Translation",
    "sc": "Google"
},

How to specify D (domains):

# When D specified, domain from URL not automatically inferred.
# Specify domain and alternate domain on same line:

D hn.algolia.com  news.ycombinator.com  # d=hn.algolia.com, ad=news.ycombinator.com

# Specify domain and alternate domain on multiple lines:

D hn.algolia.com        # d=hn.algolia.com (first D becomes domain)
D news.ycombinator.com  # ad=news.ycombinator.com (second D becomes alternate domain)
D algolia.com           # May even specify more than two domains

Proposal C

Finally, the WSV format can be simplified a little more by inferring the field names based on the content and/or order of the lines:

This format is cleaner with less boilerplate, but it might involve too much magic...

https://translate.google.com/#auto/en/{q}  # U: matches URL format
Detect language to English                 # S: first string that doesn't match any format.
!gt !gtranslate !tr !translate !gten       # T: matches trigger format.
L !this-is-label !not-a-trigger            # L: matches trigger format, but field name explicitly set.
Translation Google                         # L: doesn't match any format, not first.
"Online Services" Google                   # L
"Alternate name" "Google Translate"        # L
nobodywasishere commented 1 week ago

I would prefer to keep the format as JSON simply due to editor support and the plethora of tooling available for it, which makes it easier to work with.

While the current bang structure isn't ideal, it does work for now, and would require significant changes on our backend to accommodate.

Leftium commented 1 week ago

Conversion between the new formats and JSON is straightforward. So the proposed structure and/or WSV format can be introduced while maintaining the benefits of JSON tooling, without any changes to the backend.

So the new formats may be utilized as much or little as desired:

I will release a simple CLI tool that supports simple conversion without any installation; hopefully in a few days.

Here is what the top 10 URLs (actually 51 bang triggers) look like in WSV format:

# bangs.wsvs
# - Triggers with common URLs have been merged
# - Extra web site names, category, and sub-category fields have become Labels.

S   Google with SSL
T   !g !google !gssl !п
U   https://google.com/search?q={q}
D   google.com
L   'Online Services' Google
L   Alt-Summary Google
L   Alt-Summary google.com

S   YouTube
T   !yt !youtube !you !ytb !ty !watch !υτ
U   https://www.youtube.com/results?search_query={q}
D   youtube.com
L   Multimedia Video
L   Entertainment Misc

S   English Wikipedia
T   !w !wikipedia !wen !wiki !wk !wikien !enwiki
U   https://en.wikipedia.org/w/index.php?search={q}
D   en.wikipedia.org
L   Research Reference
L   Research Academic
L   Alt-Summary wikipedia

S   Google Maps
T   !gm !googlemaps !gmaps !gmap !googlemap
U   https://maps.google.com/maps?q={q}
D   maps.google.com
L   'Online Services' Google

S   Google Images
T   !gi !googleimages !gimg !gim !gimages !googleimg
U   https://google.com/search?tbm=isch&q={q}&tbs=imgo:1
D   google.com
L   'Online Services' Google
L   Research Reference
L   Alt-Summary 'Google Image'

S   Reddit
T   !r !reddit
U   https://reddit.com/search?q={q}
D   reddit.com
L   'Online Services' Social
L   News Aggregators

S   Amazon.com
T   !a !amazon !az !am !amz !aus !buy !amus !price
U   https://www.amazon.com/s?k={q}
D   amazon.com
L   Shopping Online
L   Alt-Summary Amazon

S   IMDB
T   !imdb !imbd
U   https://www.imdb.com/find?s=all&q={q}
D   imdb.com
L   Multimedia Movies
L   Entertainment Movies
L   Alt-Summary IMBD

S   GitHub
T   !gh !git
U   https://github.com/search?utf8=✓&q={q}
D   github.com
L   Tech Programming

S   Detect language to English
T   !gt !gtranslate !translate !gten !gt-english !gtenglish
U   https://translate.google.com/#auto/en/{q}
D   translate.google.com
L   'Online Services' Google
L   Translation Google
L   Alt-Summary 'Google Translate'
nobodywasishere commented 1 week ago

I just worry about making sure people are able to easily edit / modify the bangs as-necessary, even if they're not as code-savvy, which I'm not as confident WSV is able to provide (in comparison to JSON). As well, keeping a format similar to DDGs means it's easier to pull over their changes as they make them. I do agree that if we were doing this from scratch, I'd probably use a different format where bangs can have multiple triggers, but I'm not sure it's feasible to make that change at this point.

Leftium commented 1 week ago

I personally think plain-text would require less code-savviness then JSON, but I understand your reasoning.

For my app, I have decided to use this new structure saved as WSV plaintext files. To support it, I will build some simple CLI tools that convert back and forth between JSON and WSV losslessly. (It will also convert between combined and uncombined bang data structure, but this will be lossy; some info will be lost.)

Even if the format is not used in this repo in any way it should be possible to locally:

  1. Convert bangs.json to bangs.wsv
  2. Edit bangs.wsv
  3. Convert bangs.wsv back to bangs.json, where the only diffs will be the edits made in step 2. (Plus maybe some differences in %-encoding?)

I'll close this issue, and leave an update when my CLI tool is available.