Command search - Githubissues

rubenvereecken commented 9 years ago

I'd like to raise an idea I've played with for the last couple of days, especially when I noticed @rprieto raised it in #267 as well.

The next step after (or for) tldr is searching for commands by descriptions in natural language. When I thought of it I didn't think of it necessarily as part of tldr but it could be built on top of it. The idea is simple: we can already invoke tldr to remember those pesky command line options for a certain command. Now take a step back into novicehood and you'll want to find commands that do something in the first place. That's where this idea comes in.

A very naive version would consist of tagging commands (again, not necessarily inside tldr) and search through those when a user tries something along the lines of (tldr) search unpack tar. This can actually be built straight into #tldr using the index proposed at #267 because then the client could search the tags that would be included in the index.

Take a step further and you need an actual server to do language processing. I'd actually be interested in trying this out but it'd be a considerably bigger project and it couldn't live inside tldr anymore.

Thoughts on this?

rprieto commented 9 years ago

I like this idea :+1:

I agree on a simple version being built into tldr, that'd be very handy. And if it's a bigger service, could definitely be built on top. Not sure if it would have to be processing anything server side though, I thought simplicity was one of the strengths of tldr compared to other tools for example, there's no need for servers, very little maintenance, etc.

A while ago I thought of adding it to the node client, where searching is easy because it caches all the files locally, but the index file in #267 is great because it would enable it for everyone.

Without getting into complex NLP, I imagine we could still have a text/keywords search that's quite powerful. Keywords would be nice but would also be redundant... otherwise could we just parse & process the text on each page? Thoughts?

If we just use the text on each page, I imagine we could concat all examples, then remove any articles, pronouns etc... For the current tar page, this would give us:

create archive file gzipped extract target folder current directory bzipped compressed suffix determine compression program contents tar

or once passed through stemmer

creat archiv file gzip extract target folder current directori bzip compress suffix determin compress program content tar

then it should be easy to find all commands that match

$ tldr --search compressed bzip

Sadly it wouldn't return anything for --search unpack because that wasn't used in the page text.

rprieto commented 9 years ago

Update: it seems that https://www.npmjs.com/package/stm also takes care of removing articles & extra stop words.

leostera commented 9 years ago

:+1: on the search.

If the clients implement caching, then full-text search is an obvious answer for the cached pages. Not so for the ones that are not available tho.

In this case it'd be necessary to build a local index/cache of all pages the first time tldr search is ran, and then issue a version check every second time, npm-style. Think ~/.tldr/pages/*.md.

Then any client can implement either NLP, full-text search, skim and make a keywords list – take your pick, it'll all happen locally.

This way, tldr needs no server components and we skip the added overhead of keywords[1] in the index.

It could even work on localStorage enabled browsers for tldr.js

[1] here I'm referring to a list of keywords like the stemmer results. If these were more focused keywords ("unpack" for tar comes to mind) then it'd be great to have at least 3 per command in case the cache is not available when the search is invoked.

rubenvereecken commented 9 years ago

You're right, not even a really complex algorithm using NLP would need a server. Although sadly now every client would need to implement its own search in our scenario. I did think the index file would help but I agree with @ostera that putting keywords in the index would be added overhead and an unnecessary imposition on the clients.

I think I'd be interested in trying out or helping out with a first iteration on the node client. Parsing and processing looks like it could already give quite some valuable information, although it looks like it is missing some keywords I wouldn't mind chucking in there. It'd really be too bad to have a user's search for "unpack bzip" fail but "extract bzip" succeed.

We could solve this by adding keywords to command files but that's a pretty big decision. An alternative solution would be for the client to compensate by having a synonym dictionary. Again I'm just throwing with ideas, see if we can solve issues before they arise.

leostera commented 9 years ago

Hey guys, today I hacked a prototype of the client that would use the index.json to search for commands before making any requests.

Source code available on https://github.com/ostera/tldr.jsx

Browsable at http://ostera.io/tldr.jsx

It's still missing some things: styles, suggesting to add a command when it can't be found, actually implementing a tab-based autocompletion or typeahead. But I guess it's a baby step towards having clients use the index :)

Just wanted to share this.

mofosyne commented 8 years ago

A half baked idea here might provide some ideas for you: http://www.halfbakery.com/idea/_22How_22_20command_20line_20program

Keeping to the idea of "doing one thing and do it well", the NLP tldr search program could be named as how and work like this:

> how to create an archive from file?
    # Using `tar` - Archiving utility
        tar cf target.tar file1 file2 file3

Not sure how one would have how answer further questions like "what is the cf term and what does it mean in the tar program example above". But you could create another program like explain, which could do interesting stuff like query http://www.cdecl.org/ like below:

> explain this C line `char * const (*(* const bar)[5])(int )`
    This C line is declaring bar as const pointer to array 5 of pointer to function (int) returning const pointer to char

In fact... maybe you could create a whole series of similar programs e.g. how, what, where, when and explain programs

Anyway keeping this effort as a separate program will have the advantage of being able to use multiple databases besides tldr (e.g. cheat or bro).

rubenvereecken commented 8 years ago

Was away for a bit, sorry for the wait.

@mofosyne thanks for sharing the ideas. The how example is exactly how I imagined it. Keeping these programs is definitely the better choice, if only just for development.

Their responsibilities are really broad though. It looks like only StackOverflow could answer most questions. My original thought was to keep it restricted to just the scope we cover with tldr.

Want to have a chat about this some time?

igorshubovych commented 8 years ago

https://www.npmjs.com/package/how2

mofosyne commented 8 years ago

happy to talk about this anytime. It's something I really hope would exist... as I constantly forget how to do stuff like change my static IP...

Also something like this would be better and more comprehensive if it has the option of tapping into multiple sources. Locking it to just tldr will also have the risk of this program dying if people lose interest in developing tldr

Also I would expect that any such program will also have its own offline database of most common questions. Since you cannot always expect to be connected to the net.

agnivade commented 6 years ago

A WIP version of this has been started in https://github.com/tldr-pages/tldr-node-client/pull/161

agnivade commented 6 years ago

PR is merged now. The node client has the search functionality. I am going to close this now as at least one client supports it.

tldr-pages / tldr

Command search #273