progfolio / doct

DOCT: Declarative Org Capture Templates for Emacs
GNU General Public License v3.0
380 stars 8 forks source link

%{KEYWORD} Expansion fails when KEYWORD is a function modifying match-data #17

Closed yantar92 closed 4 years ago

yantar92 commented 4 years ago

Describe the problem

When using %{KEYWORD} expansion with KEYWORD set to (lambda () ... (re-search-forward ...))), capturing fails silently. I tracked this down to doct--replace-template-strings where the following code assumes that match data is not modified by the user function.

(replace-match (if (functionp val)
                                 (doct--replace-template-strings (funcall val))
                               (or val ""))
                             nil t))

I think that the funcall must be wrapped into save-match-data to avoid replace-match throwing error.

Version information

progfolio commented 4 years ago

Thank you for the report and analysis. Can you please provide a specific example of a function you're using that alters the match-data? Your assessment is correct, I'm just curious as to what your use case is.

yantar92 commented 4 years ago

I am trying to generate bibtex entry for captured webpages. The code is rather long and only half-way done, but since you asked, see the below. Note several instances of save-match-data in the code. I had to add it to work around the issue.

(require 'org-ref-url-utils)

(use-package org-cliplink :straight t)

(setq bibtex-autokey-year-length 4)

(defvar yant/org-capture-current-bibtex nil
  "Parsed bibtex entry for the currently captured item.")

(defvar yant/org-capture-html-buffer nil
  "Buffer containing the captured webpage html.")

(defun yant/org-capture-get-html-buffer ()
  "Get buffer containing the captured webpage html."
  (if yant/org-capture-html-buffer
      yant/org-capture-html-buffer
    (setq yant/org-capture-html-buffer nil)
    (when-let ((link (plist-get org-store-link-plist :link)))
      (setq yant/org-capture-html-buffer
        (let* ((query (plist-get org-store-link-plist :query))
           (html (plist-get query :html)))
          (if html
          (find-file-noselect html)
        (url-retrieve-synchronously link)))))))

(setq org-ref-url-bibtex-template
      "@misc{key,
      title        = {${:title}},
      author       = {${:author}},
      howpublished = {${:url}},
      url          = {${:url}},
      year         = {${:year}},
      note         = {Online; accessed ${:urldate}}
      }")

(defvar yant/org-capture-youtube-bibtex-template
  "@misc{key,
      title          = {${:title}},
      author         = {${:author}},
      authorhomepage = {${:author-homepage}},
      howpublished   = {Youtube},
      url            = {${:url}},
      year           = {${:year}},
      note           = {Online; accessed ${:urldate}}
      }")

(defun yant/org-capture-get-bibtex-doi-url ()
  "Try to scrape DOI from current url and return the bibtex entry."
  (when-let* ((link (plist-get org-store-link-plist :link))
          (dois (org-ref-url-scrape-dois link))
              (doi (car dois)))
    (with-temp-buffer
      (bibtex-mode)
      (insert (doi-utils-doi-to-bibtex-string doi))
      (backward-char)
      ;; set date added for the record
      (let ((ts (funcall doi-utils-timestamp-format-function)))
    (when ts
      (bibtex-set-field doi-utils-timestamp-field
                ts)))
      (org-ref-clean-bibtex-entry)
      (buffer-string))))

(defun yant/org-capture-get-bibtex-youtube ()
  "Parse youtube link and generate bibtex entry."
  (when-let ((link (plist-get org-store-link-plist :link)))
    (when (string-match "youtube\\.com" link)
      (with-current-buffer (yant/org-capture-get-html-buffer)
    (let ((fields `((:url . ,link)
            (:urldate . ,(format-time-string "%d %B %Y"))))
          entry)
      ;; find channel url
      (goto-char (point-min))
      (when (re-search-forward "/channel/\\([^\"]+\\)\"" nil t)
        (let ((channel-url (s-concat "https://youtube.com/channel/" (match-string 1))))
          (push (cons :author-homepage channel-url) fields)))
      ;; find author
      (goto-char (point-min))
      (when (re-search-forward "channelName\":\"\\([^\"]+\\)\"" nil t)
        (let ((channel-name (match-string 1)))
          (push (cons :author channel-name) fields)))
      ;; find title
      (goto-char (point-min))
      (when (re-search-forward "title\":\"\\([^\"]+\\)\"" nil t)
        (let ((title (match-string 1)))
          (push (cons :title title) fields)))
      ;; find year
      (goto-char (point-min))
      (when (re-search-forward "publishDate\":\"\\([^\"]+\\)\"" nil t)
        (let ((year (match-string 1)))
          (string-match "[0-9]\\{4\\}" year)
          (push (cons :year (match-string 0 year)) fields)))
      (org-ref-url-add-nil fields)
      (setq entry (s-format
               yant/org-capture-youtube-bibtex-template
               'aget fields))
      (with-temp-buffer
        (insert (if (require 'org-cliplink nil 'noerror)
            ;; Sanitize values by replacing html entities
            (org-ref-url-html-replace entry)
              entry))
        (bibtex-mode)
        (bibtex-beginning-of-entry)
        (let ((bibtex-autokey-prefix-string "youtube-"))
          (org-ref-clean-bibtex-entry))
        (buffer-string)))))))

(defvar yant/org-capture-get-bibtex-functions '(yant/org-capture-get-bibtex-youtube
                                                yant/org-capture-get-bibtex-doi-url)
  "List of functions to generate bibtex entry during capture.
The functions are called in order with defined `org-store-link-plist'.
First non-nil return value is used as a headline.
If all the functions return nil, `org-ref-url-html-to-bibtex' is used to generate the bibtex.")

(defun yant/org-ref-url-html-to-bibtex (url)
  "Convert URL to a bibtex entry."
  (save-match-data
    (with-temp-buffer
      (let* ((alist (org-ref-url-html-read url))
         entry)
        (string-match "\\(?:https?://\\)?\\(?:www\\.\\)?\\([^.]+\\)" (alist-get :howpublished alist))
        (when (match-string 1 (alist-get :howpublished alist))
          (setf (alist-get :howpublished alist) (match-string 1 (alist-get :howpublished alist))))
        (setq (entry (s-format
              org-ref-url-bibtex-template
              'aget alist)))
    (insert (if (require 'org-cliplink nil 'noerror)
            ;; Sanitize values by replacing html entities
            (org-ref-url-html-replace entry)
          entry))
    (bibtex-mode)
    (bibtex-beginning-of-entry)
    (org-ref-clean-bibtex-entry)
    (buffer-string)))))

(defun yant/org-capture-get-bibtex ()
  "Generate bibtex entry for currently captured item."
  (if yant/org-capture-current-bibtex
      (alist-get "raw-string" yant/org-capture-current-bibtex nil nil #'string=)
    (save-match-data
      (let ((result (or (run-hook-with-args-until-success 'yant/org-capture-get-bibtex-functions)
            (when (plist-get org-store-link-plist :link)
              (yant/org-ref-url-html-to-bibtex (plist-get org-store-link-plist :link))))))
    (when result
      (with-temp-buffer
        (insert result)
            (goto-char 1)
            (bibtex-mode)
            (setq yant/org-capture-current-bibtex (bibtex-parse-entry))
            (when (or (not (alist-get "title" yant/org-capture-current-bibtex nil nil #'string=))
              (string= "{nil}" (alist-get "title" yant/org-capture-current-bibtex nil nil #'string=)))
          (setf (alist-get "title" yant/org-capture-current-bibtex nil nil #'string=) (format "{%s}" (plist-get org-store-link-plist :description))))
            (setf (alist-get "raw-string" yant/org-capture-current-bibtex) result)))
    result))))

(defvar yant/org-capture-get-headline-functions '(yant/org-capture-get-title-from-bibtex)
  "List of functions to generate headline (without tags) during capture.
The functions are called in order with defined `org-store-link-plist'.
First non-nil return value is used as a headline.
If that is also not defined, `%:description' property from `org-store-link-plist' is used.")

(defun yant/org-capture-get-title-from-bibtex ()
  "Try to get title from bibtex entry."
  (yant/org-capture-get-bibtex)
  (when-let ((title (org-ref-reftex-get-bib-field "title" yant/org-capture-current-bibtex))
         (author (org-ref-reftex-get-bib-field "author" yant/org-capture-current-bibtex))
             (year (org-ref-reftex-get-bib-field "year" yant/org-capture-current-bibtex))
             (journal (org-ref-reftex-get-bib-field "journal" yant/org-capture-current-bibtex))
         (howpublished (org-ref-reftex-get-bib-field "howpublished" yant/org-capture-current-bibtex)))
    (unless (or (string= "" title)
        (string= "" author)
        (string= "" year)
                (and (string= "" howpublished)
             (string= "" journal)))
      (format "%s [%s] (%s) %s" author (if (seq-empty-p journal) howpublished journal) year title))))

(defun yant/org-capture-get-title ()
  "Calculate headline for currently captured item."
  (save-match-data
    (with-temp-buffer
      (insert
       (or (run-hook-with-args-until-success 'yant/org-capture-get-headline-functions)
       (plist-get org-store-link-plist :description)))
      (shr-render-region (point-min) (point-max))
      (s-replace "\n" "" (buffer-string)))))

(defvar yant/org-capture-get-extra-functions '(yant/org-capture-extra-papers)
  "List of functions to generate body during capture.
The functions are called in order with defined `org-store-link-plist'.
First non-nil return value is inserted.")

(defun yant/org-capture-extra-papers ()
  "Insert checklist for reading scientific papers."
  (let ((link (plist-get org-store-link-plist :link)))
    (pcase link
      ((pred (lambda (link) (s-matches? "sciencedirect\\|elsevier\\|cambridge\\|asme\\.org\\|hanser-elibrary\\.com\\|tandfonline\\|scholarsarchive\\.byu\\.edu\\|osti\\.gov\\|royalsocietypublishing\\.org" link)))
       (s-join "\n"
           '("- [ ] download and attach pdf"
         "- [ ] check if bibtex entry has missing fields"
                 "- [ ] read paper"
         "- [ ] check citing articles"
         "- [ ] check related articles"
         "- [ ] check references")))
      (_
       nil))))

(defun yant/org-capture-get-extra ()
  "Calculate text to be inserted into body of currently captured item."
  (save-match-data
    (or (run-hook-with-args-until-success 'yant/org-capture-get-extra-functions)
    ;; (when-let ((fifo (plist-get (plist-get org-store-link-plist :query) :fifo)))
    ;;   (start-process-shell-command "Send message to qutebrowser"
    ;;                 nil
    ;;                                (format "echo 'message-warning \"Bookmarking unknown website\"' >> %s" fifo)))
    "")))

(defun yant/org-capture-get-id ()
  "Generate :ID: property."
  (save-match-data
    (yant/org-capture-get-bibtex)
    (if-let ((id (org-ref-reftex-get-bib-field "=key=" yant/org-capture-current-bibtex)))
    (if (org-id-find id)
        (user-error "Existing entry with id %s" id)
      id)
      (org-id-new))))

Capture template

(let ((templates (doct '(:group
             "Browser link"
             :type entry
             :file "~/Org/inbox.org"
                         :fetch-bibtex (lambda ()
                     (setq yant/org-capture-current-bibtex nil)
                     (funcall #'yant/org-capture-get-bibtex)
                                         "") ; this must run first
                         :fetch-html (lambda ()
                       (setq yant/org-capture-html-buffer nil)
                       (yant/org-capture-get-html-buffer)
                                       "")
                         :bibtex (lambda () (funcall #'yant/org-capture-get-bibtex))
                         :title (lambda () (funcall #'yant/org-capture-get-title))
                         :extra (lambda () (funcall #'yant/org-capture-get-extra))
                         :id (lambda () (funcall #'yant/org-capture-get-id))
             :template
             ("%{fetch-bibtex}%{fetch-html}* TODO %? %{title} :BOOKMARK:"
              ":PROPERTIES:"
                          ":ID: %{id}"
              ":CREATED: %U"
              ":Source: %:link"
              ":END:"
                          "%{extra}"
                          "# the following bibtex entry should be moved to bibliography if it is good enough"
                          "#+begin_src bibtex"
                          "%{bibtex}"
                          "#+end_src")
             :children (("Interactive link"
                     :keys "b"
                     :clock-in t
                     :clock-resume t
                     )
                    ("Silent link"
                     :keys "B"
                     :immediate-finish t))))))
  (dolist (template templates)
    (asoc-put! org-capture-templates
           (car template)
               (cdr  template)
               'replace)))
progfolio commented 4 years ago

Pushed a patch that should take care of saving the match data, restriction and excursion. So you should be able to modify any of that without worry. If you wouldn't mind testing it, I'd appreciate it.

That's quite a setup! If it proves useful for you, I encourage you to make it into a package to share with others.

One small tip since you're on the latest Emacs: Recently indentation for data lists was fixed. If you prefix the first element with a space in a plist, the remainder of the list should indent correctly. e.g.

'(:group
  "Browser link"
  :type entry
  ;;etc
  )

Can be:

'( :group "Browser link"
   :type entry
   ;;etc.
   )
yantar92 commented 4 years ago

Pushed a patch that should take care of saving the match data, restriction and excursion. So you should be able to modify any of that without worry. If you wouldn't mind testing it, I'd appreciate it.

I tried to remove all the save-match-data from my code. With your patch, nothing got broken and everything works as expected.

That's quite a setup! If it proves useful for you, I encourage you to make it into a package to share with others.

I may share the code on github, but I have little interest (and often time) for maintaining it other than for my specific use-cases. The code is going to be written to work in conjunction with qutebrowser userscripts.

Recently indentation for data lists was fixed. If you prefix the first element with a space in a plist, the remainder of the list should indent correctly.

Thanks! I did not know that.

yantar92 commented 4 years ago

That's quite a setup! If it proves useful for you, I encourage you to make it into a package to share with others.

I finally decided to make it into a package since it can really use community contributions to various web-site parsers. https://github.com/yantar92/org-capture-ref