kiwanami / emacs-anything-books

Opening your PDF books by the anything interface
16 stars 5 forks source link

Search using directory names #8

Open tkf opened 10 years ago

tkf commented 10 years ago

It would be nice if I can use directory names to search in anything-books. You could put path in the candidates. You could even separate the font for book name and directory path (for fun!), like I do in pinot.el (https://github.com/tkf/emacs-pinot-search ):

screenshot-2014-02-25-111553

kiwanami commented 10 years ago

Thank you for your suggestion. pinot.el seems to be nice! Yes, I think it is not difficult to use directory name with anything filtering. I'll try later.

Well, I'm planning to make an emacs(vim) application for the full-text searching from PDF files, such as papers and scan books. Is pinot-seach suitable for this solution?

tkf commented 10 years ago

planning to make an emacs(vim) application for the full-text searching from PDF

Cooool! Yes I think pinot is a good choice. At the time I started using it, it was the only desktop search tool with a hackable set of APIs. Pinot has D-BUS and CLI so it is quite easy to use from Emacs. Recoll may be an alternative but at the time I tried it does not install API (Python interface) by default. http://www.lesbonscomptes.com/recoll/index.html

kiwanami commented 10 years ago

Thank you for pinot information. I looked at pinot and recoll. They use xapian as a full-text search engine: http://xapian.org/. It can use N-gram index for Japanese texts. Recently, I usually use groonga, which has many functions for full-text searching, especially for Japanese texts, such as normalizer and tokenizer.

I need indexing per pages so as to find texts at each pages, not the whole text in the PDF files. Because the text size of books are very large, using such the search engine, I can not reach the searched position with one action. How do you think about per-page indexing?

tkf commented 10 years ago

I think pinot has OK CJK support but I guess groona is better than xapian in terms of Japanese search. I guess pinot has CJK tokenizer on top of xapian. I have to try synonyms when search so I assume pinot nor xapian has normalizer even for English

Index-per-page would be great! My concern is that sometimes you could get a lot of result from one book, if the word you searched for appears in that book many times. How about indexing per file and per page separately then searching per file first and showing some "top pages" per file using per-page index? It is really opted for searching books so it makes sense to have a tool for that instead of a general desktop search tool.