phiresky / ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Other
6.55k stars 154 forks source link

Use page numbers as line numbers where appropriate #34

Open cyruseuros opened 4 years ago

cyruseuros commented 4 years ago

Editor tools that integrate with rg(a) rely on line number info to jump to the appropriate location. Implementing this would enable them to jump to the right page (as the line number is arbitrary anyway, generated by pdf2text, docx2text... and latex (at compile time), libre office, mobi, word re-flow text to page/margin size ).

Old way:

60:Page 5:       Helpman, 2019)
654:Page 51:            Grossman-Helpman Model)
745:Page 56:    Grossman, G. and Helpman, E., 2019, “Identity politics and trade policy”, Working Paper

New way:

5:60:       Helpman, 2019)
51:654:            Grossman-Helpman Model)
56:745:    Grossman, G. and Helpman, E., 2019, “Identity politics and trade policy”, Working Paper

This could either be a default or a command line --option.

phiresky commented 4 years ago

This is a good idea and something I would have liked to have from the start, but it's not currently possible: The only way rga communicates with ripgrep is by giving it a stream of data, with no metadata anywhere.

So one of the following two things would need to happen for this to be possible:

I might open an issue at ripgrep regarding the first point, since I don't think that has been brought up there yet, but it's kind of specific and I don't think @burntsushi would be interested to integrate this.

Can you say why specifically you want this? Considering I'm intentionally hiding the line numbers by default in rga, the only annoyance I've encountered so far is that the page number itself is searched (so you can't search documents for e.g. Page 3). So that alone as motivation for rg to add the possibility for doing this would probably be fairly thin.

cyruseuros commented 4 years ago

I actually use rga with Emacs (adding interactivity to it absolutely rocks). helm-ag works with any searcher that indexes with line numbers, and in pdf-view-mode, page numbers are treated as the best approximation of line numbers (so that the standard go-to-line commands are more like go-to-page). So if page numbers were in the first column, separated by a colon, everything would work out of the box.

I set `(setq helm-ag-base-command "rga --no-heading --smart-case --line-number"), and everything works like a charm, even zip files which Emacs can decompress on entry. Except pdfs.

AtomicNess123 commented 4 years ago

I am actually looking for an all-search like rga in emacs within my text and zip and pdfs. I was thinking of using helm-rg with it (as I don's use helm-ag). Would it be possible? You mentioned that with helm-ag you can't search within PDFs, if I am not mistaken.

AtomicNess123 commented 4 years ago

@jjzmajic Would you share your code / settings to have all ready for searching helm-ag with rga as you explained above? I'd love to give it a try, as I have plenty of ZIP, PDF and text notes I must look through for my research, and I am no programmer to come up with a method by mixing features.

cyruseuros commented 4 years ago

I completely nuked my Emacs config since and started using helm-recoll, but try adding (setq helm-ag-base-command "rga --color never --line-number --no-heading --with-filename") to your init.el. Do note that when you hit enter on the search results it will take you to the wrong page because it will think that the line number is a page number. Still, finding the matches can be helpful.

I suspect there is a similar variable you can configure in helm-rg. Let me know if you need help finding it.

phiresky commented 4 years ago

Regarding my first comment (https://github.com/phiresky/ripgrep-all/issues/34#issuecomment-554359798), fixing this is actually a lot simpler than I thought - just outputting "123:content" instead of "Page 123:content" here should work and then invoking rga without --line-number. Not really clean but eh. Still, that depends on #52 first since I don't want to do that by default.

cyruseuros commented 4 years ago

@phiresky that makes sense and it's necessary if any kind of global config such as the one mentioned above is to work. How would this be done. When it comes to something this fine-grained I think a config file is a better option than just cli arguments, but having both wouldn't hurt.

AtomicNess123 commented 4 years ago

What does "--with-filename" do? Does it remove the filename from each search result?