mpedramfar / zotra

GNU General Public License v3.0
168 stars 5 forks source link

Handle multiple responses from Zotero server #2

Closed SterlingHooten closed 1 year ago

SterlingHooten commented 1 year ago

When the Zotero server detects multiple DOIs or other resource identifiers it will return a JSON object with the possible resources listed as 'items:'. These are intended to be selected from by the user, and then POSTed back to the Zotero server to actually fetch the JSON citation for them.

These responses come back wrapped in curly braces rather than square brackets, and will currently give an error with Zotra and fail (as the Zotero server expects an array).

I've hacked up a solution but still haven't determined where to put the selection framework (discussed afterwards).

The Zotero server project website gives an overview of the process for dealing with multiple options, but I'll reiterate them here. You first POST a standard request like:

curl -d 'https://www.ncbi.nlm.nih.gov/pubmed/?term=crispr' -H 'Content-Type: text/plain' http://127.0.0.1:1969/web

Which gives back a response:

{
    "url": "https://www.ncbi.nlm.nih.gov/pubmed/?term=crispr",
    "session": "9y5s0EW6m5GgLm0",
    "items": {
        "u30044970": {
            "title": "RNA Binding and HEPN-Nuclease Activation Are Decoupled in CRISPR-Cas13a."
        },
        "u30044923": {
            "title": "Knockout of tnni1b in zebrafish causes defects in atrioventricular valve development via the inhibition of the myocardial wnt signaling pathway."
        },
        // more results
    }
}

To make a selection, delete unwanted results from the items object and POST the returned data back to the server as application/json.

curl -d '{"url":"https://www.ncbi.nlm.nih.gov/pubmed/?term=crispr","session":"iq0qE4Xkqx1yCzE","items":{"u32200959":"CRISPR-Cas12a: Functional overview and applications.","u26470680":"Advances in therapeutic CRISPR/Cas9 genome editing."}}' -H 'Content-Type: application/json' 'http://127.0.0.1:1969/web'

Which returns a JSON array

[{"key":"G88I57NI","version":0,"itemType":"journalArticle","creators":[{"firstName":"Bijoya","lastName":"Paul","creatorType":"author"},{"firstName":"Guillermo","lastName":"Montoya","creatorType":"author"}],"tags":[{"tag":"Animals","type":1},{"tag":"CRISPR-Cas
Systems","type":1},{"tag":"Clustered Regularly Interspaced Short
Palindromic
Repeats","type":1},{"tag":"Endonucleases","type":1},{"tag":"Gene
Editing","type":1},{"tag":"Humans","type":1},{"tag":"RNA","type":1},{"tag":"CRISPR-Cas12a","type":1},{"tag":"Endonuclease
recycling","type":1},{"tag":"Genome
editing","type":1},{"tag":"Indiscriminate
ssDNAse","type":1},{"tag":"RNA guided
endonucleases","type":1},{"tag":"crRNA
biogenesis","type":1}],"title":"CRISPR-Cas12a: Functional overview and
applications","pages":"8-17","ISSN":"2320-2890","journalAbbreviation":"Biomed
J","publicationTitle":"Biomedical
Journal","volume":"43","issue":"1","date":"2020-02","language":"eng","abstractNote":"Prokaryotes
have developed an adaptive immune system biotechnology
applications.","DOI":"10.1016/j.bj.2019.10.005","extra":"PMID:
32200959\nPMCID:
PMC7090318","libraryCatalog":"PubMed","shortTitle":"CRISPR-Cas12a"},{"key":"ULS7Q6Z4","version":0,"itemType":"journalArticle","creators":[{"firstName":"Nataša","lastName":"Savić","creatorType":"author"},{"firstName":"Gerald","lastName":"Schwank","creatorType":"author"}],"tags":[{"tag":"Animals","type":1},{"tag":"Bacterial
Proteins","type":1},{"tag":"CRISPR-Associated Protein
9","type":1},{"tag":"CRISPR-Cas Systems","type":1},{"tag":"Clustered
Regularly Interspaced Short Palindromic
Repeats","type":1},{"tag":"Endonucleases","type":1},{"tag":"Gene
Expression Regulation","type":1},{"tag":"Genetic
Therapy","type":1}],"title":"Advances in therapeutic CRISPR/Cas9
genome
editing","pages":"15-21","ISSN":"1878-1810","journalAbbreviation":"Transl
Res","publicationTitle":"Translational Research: The Journal of
Laboratory and Clinical
Medicine","volume":"168","date":"2016-02","language":"eng","abstractNote":"Targeted
nucleases are widely used as tools for genome editing. Two years ago
the clustered regularly reports.","DOI":"10.1016/j.trsl.2015.09.008","extra":"PMID:
26470680","libraryCatalog":"PubMed"}]

To implement this in Zotra I first tried to detect whether the response is multiple or not:

(defun swh-zotra-json-multiple-p (json)
  "Return non-nil if JSON string is a collection
 of items rather than a single bibliography result"
  (not (string-match "^\\[.*\]$" json))
  ;; TODO Write a better regex, this is just seeing if brackets are missing
  ;; Might be improved by using the json.el library to see if there's an item element
  )

If it is we can use swh-zotra-get-json-from-multiple to get the responses by querying the Zotero server again with content type "application/json".

(defun swh-zotra-get-json-from-multiple (json-multiple &optional is-search)
  "Get citation data of JSON array with possibly multiple in Zotero JSON format."
  (let
      ((json
        (if zotra-use-curl
            (zotra-run-external-curl json-multiple
                     ;; SWH 2022-12-05 changing from text to json
                                     "application/json"
                                     (concat zotra-server-path
                         ;; FIX shouldn't really allow search, but that's fine
                                             (if is-search "/search" "/web")))
          (let*
              ((url-request-method "POST")
               (url-request-extra-headers '(("Content-Type" . "application/json")))
               (url-request-data json-multiple)
               (response-buffer (url-retrieve-synchronously
                                 (concat zotra-server-path
                                         (if is-search "/search" "/web"))
                                 nil nil zotra-url-retrieve-timeout))
               (output
                (if (null response-buffer)
                    (user-error "Request failed. If this issue persists, try again with `zotra-use-curl'.")
                  (with-current-buffer response-buffer
                    (goto-char (point-min))
                    (search-forward "\n\n")
                    (delete-region (point-min) (point))
                    (buffer-string)))))
            (kill-buffer response-buffer)
            output))))
    (cond ((string= json "URL not provided")
           (user-error "URL not provided"))
          ((string= json "No identifiers found")
           (user-error "No identifiers found"))
          (t json))))

Now this can be treated the same as a typical response and fed into zotra-get-entry. I've rewritten it to simply check whether the initial response contains items, and if so to query again.

(defun swh-zotra-get-entry (url-or-search-string &optional is-search entry-format)
"Get entry in a JSON format for url or search string 
allowing for possibility ofb multiple items."
   (zotra-get-entry-from-json
    (let* ((json (zotra-get-json url-or-search-string is-search))
       (json (if (not (swh-zotra-json-multiple-p json))
             json
           ; Okay, we have a multiple item list
             (swh-zotra-get-json-from-multiple json))))
      json)
    entry-format))

In this configuration swh-zotra-get-entry will return more than one bibtex entry for an initial response that contains multiple items. Without a selection framework interjected at that point I'm imagining some of the Zotra functions will fail.

The Zotero devs are assuming there's an interactive pop-up that a user can select from. In my implementation I have a separate capture frame come up and fetch the URL data from Safari.

One possibility would be to select from the `items:` that are in the JSON object initially returned by the Zotero server. In certain circumstances (e.g., search through SCOPUS or the NIH site) a large number (>300) of potential candidates may be returned. Selecting at this point might be faster (untested), as the Zotero server would only need to fetch the selected items.

Another possibility would be to fetch all of the items automatically, get the bibtex data back from the Zotero server, and then select on the bibtex entries. This has the advantage of simplifying the Zotra library (as selection is then a problem for whatever system is calling it). It also would be noninteractive.

Perhaps it's best to just implement both of these? And then have some variable or argument to determine where the selection should be made.

mpedramfar commented 1 year ago

Hi! Thank you for raising this issue and #1. I've been slowly implementing some changes in zotra over time that I haven't cleaned up to put here. The one I've implemented uses completing-read-multiple and asks the user to select which of the items should be fetched, and then only those items are fetched from the server. I'll clean up the code and push the changes. If it turns out to be slow, we can always add more options and change the implementation.

mpedramfar commented 1 year ago

I've add the option to handle multiple responses and variables to configure the behavior when there are multiple responses. I'm closing this issue as resolved. Feel free to raise another issue if there are any problems with it.

SterlingHooten commented 1 year ago

Thank you so much for implementing this!

I've been (casually) testing it for the past month and it seems to work well so far.