tldr-pages / tldr-python-client

Python command-line client for tldr pages
https://pypi.org/project/tldr/
MIT License
596 stars 93 forks source link

Feature Request: Search for appropriate command #172

Closed gotlougit closed 2 years ago

gotlougit commented 2 years ago

There are many situations where you know which tool to use and what you want done with it, but you can't exactly recall which command is the one that will do the job. For simpler utilities, tldr <utility-name> will often do the job, but for programs with multiple subcommands (for example, git), this will become very tedious.

I propose having an option to search the tldr pages for keywords. In order to speed it up, the user could specify which utility they want to search in.

It would work as such:

  1. User wants to know how to delete a branch in git
  2. Open up terminal and type tldr git --search="delete branch"
  3. tldr will output something like this:

    
    git branch
    
    Main Git command for working with branches.
    More information: https://git-scm.com/docs/git-branch.
    
    - **Delete** a local **branch** (must not have it checked out to do this):
    git branch -d branch_name
    
    - **Delete** a remote **branch**:
    git push remote_name --delete remote_branch_name

Essentially, the basic overview of the page it got the information from and the relevant entries (the keywords are highlighed but this is optional).

This way we can add yet another good use case for tldr
marchersimon commented 2 years ago

Thanks for the request @gotlougit. I'm not sure about Python, but generally this is a rather hard task to do. So maybe someone wants to work on this, but I can't promise. If you really need this feature you could try out my C++ client. It's not perfect, but works for the most cases. For example, when you search for "delete branch", you get: image

gotlougit commented 2 years ago

I'm not sure about Python, but generally this is a rather hard task to do. So maybe someone wants to work on this, but I can't promise. If you really need this feature you could try out my C++ client

I saw your C++ client, and although I'm not too familiar with C++, I think I could create a search function for the Python version as well.

My method is completely basic, to try and search every page related to the command for the required command. If it doesn't find anything, it should print out a more information link given in the tldr pages, but if the page doesn't have even that, then the program should say "No output found".

If it does find commands related to the keywords a user searches for, then it should do what the C++ program does, print out those specific commands.

Would that be a good approach to take?

gotlougit commented 2 years ago

I've started to integrate this feature into the Python client myself, and I think it works OK. I've tried to make sure the code follows the guidelines and would like further suggestions.

I've only made sure to output a page that's relevant to the search query, so as to simplify usage.

Right now, the usage is as such:

tldr --search "search here for what you want the command to do"

Right now it outputs just the git send-email page with no extra tweaks. I don't know how to highlight the words like in the C++ version, and the search query operates over the entire page right now, not the command descriptions which would certainly help in getting better results.

The way it ranks the search results give most weightage to the first word's searched, then the 2nd and so on, so you have to use relevant keywords as well.

I'll open a pull request shortly.

marchersimon commented 2 years ago

Some tweaks I'd make are:

If you want to highlight the found word you'd have to remember where you found it, but that completely depends on how you implemented the search.

If you're more interested in search engines you could have a look at TF/IDF, which determines how relevant a result is, and the Porter Stemmer, which makes that manager, managing, management, ... are all treated as the same word. Overall, this is really great video about this topic.

gotlougit commented 2 years ago
  • Search for each term separately and give a point every time one found within a page. The pages with the highest scores are the best results.

The algorithm does search for each word separately in each page, but it gives greater preference to the first couple words. I've tried to run it with giving each word the same value and it seems to work alright so I'll add this to the PR.

  • Only search in lines starting with |-| or |>|.

This will be pretty easy to implement in Python, I'll add it to the pull request.

If you want to highlight the found word you'd have to remember where you found it, but that completely depends on how you implemented the search.

Not sure about the way to proceed here, I'll tackle this later.

If you're more interested in search engines you could have a look at TF/IDF, which determines how relevant a result is, and the Porter Stemmer, which makes that |manager|, |managing|, |management|, ... are all treated as the same word.

Overall, this https://www.youtube.com/watch?v=2OY4tE2TrcI&t=1891s is really great video about this topic.

Oh thanks! I'll definitely check it out!

gotlougit commented 2 years ago

I've also noticed the code I wrote was failing the style guidelines, so I'll fix that as well

gotlougit commented 2 years ago

This commit pretty much solves this, so closing issue.