tmalsburg / helm-bibtex

Search and manage bibliographies in Emacs
GNU General Public License v2.0
464 stars 74 forks source link

cannot open some pdf with all ascii character #330

Open tshu-w opened 4 years ago

tshu-w commented 4 years ago

Hi, I meet an issue when open pdf of some entries, even though the filename of it only contains ascii character. The following is an example:

@inproceedings{binmohdazir2017wrapper,
  title = {Wrapper Approaches for Web Data Extraction : {{A}} Review},
  shorttitle = {Wrapper Approaches for Web Data Extraction},
  booktitle = {2017 6th {{International Conference}} on {{Electrical Engineering}} and {{Informatics}} ({{ICEEI}})},
  author = {Bin Mohd Azir, Mohd Amir and Ahmad, Kamsuriah Binti},
  year = {2017},
  month = nov,
  pages = {1--6},
  publisher = {{IEEE}},
  address = {{Langkawi}},
  doi = {10.1109/ICEEI.2017.8312458},
  abstract = {Relational databases are known as collections of structured data within the digital structure and are normally arranged in rows and columns. However, most business data are present in the form of unstructured. Data extraction is a process of extracting unstructured, semi-structured, and structured data from the user requirement upon the web pages on the internet, in any type of automation level. Web pages contain data region which is formally in a structured data format. Manipulating and analyzing data using tools always required massive computing server resources. This paper will review existing techniques on data extraction for heterogeneous data in the Big Data environment. This review is aimed to discuss different data extraction approaches together with the basic tools algorithm for extracting favored data from various web sources. The various types of approaches that will be examined are Information Extraction Approaches, Automatic Wrapper Generation, SemiAutomatic Wrapper Generation, Wrapper Induction, and Wrapper Maintenance. Although, many required techniques from web sources have been tested and developed, but the reviews on these techniques are still lacking. This paper reviews data extraction using wrapper approaches and compares each to identify the best approach to extract data from online sites.},
  file = {/Users/wangtianshu/Zotero/storage/556VGSG6/Bin Mohd Azir and Ahmad - 2017 - Wrapper approaches for web data extraction  A rev.pdf},
  isbn = {978-1-5386-0475-5},
  language = {en},
  note = {00003}
}

When I open it, I go to the root directory.

my ivy-bibtex setting:

(use-package ivy-bibtex
  :ensure t
  :init
  (setq biblio-crossref-user-email-address user-mail-address
        bibtex-autokey-year-length 4
        bibtex-completion-additional-search-fields '(keywords)
        bibtex-completion-bibliography '("~/Documents/Bibliography/references.bib"
                                         "~/Documents/Zotero/references.bib")
        bibtex-completion-library-path '("~/Documents/Bibliography/pdfs/"
                                         "~/Documents/Zotero/storage/")
        bibtex-completion-notes-path "~/Documents/Org/research_notes.org"
        bibtex-completion-notes-template-one-file "\n* TODO ${author-or-editor} (${year}): ${title}\n  :PROPERTIES:\n  :Custom_ID: ${=key=}\n  :END:\n\n"
        bibtex-completion-pdf-field "file"
        bibtex-dialect 'biblatex)
  :general
  (tyrant-def "ab" 'ivy-bibtex))

GNU Emacs 26.3, macOS 10.15.4

tmalsburg commented 4 years ago

What PDF viewer are you using to open PDFs? In wonder whether the spaces in the file name are causing problems. Could you try what happens when you remove the spaces?

tshu-w commented 4 years ago

My bibtex-completion-pdf-open-function is find-file and I use PDF tools to open it. You're right, it's caused by spaces. After removing the double spaces before A rev.pdf, I can open it now. By the way, I don't need to remove all the spaces.

tmalsburg commented 4 years ago

There appears to be a bug that causes sequences of white spaces to be collapsed. Tried to find it but failed because when I debug it doesn't happen. :( Will try again soon with fresh eyes.