mkmcc / bibslurp

retrieve BibTeX entries from NASA ADS
http://astro.berkeley.edu/~mkmcc/software/bibslurp.html
GNU General Public License v3.0
12 stars 5 forks source link

ADS Classic is being retired #12

Open Sbozzolo opened 5 years ago

Sbozzolo commented 5 years ago

ADS Classic will be deprecated in May 2019 and retired in October 2019.

Any hope this awesome package will be updated for the new ADS?

giordano commented 5 years ago

Any hope this awesome package will be updated for the new ADS?

Well, if someone can write the interface with the official API (that is issue #10), yes ☺️

Sbozzolo commented 5 years ago

Well, if someone can write the interface with the official API (that is issue #10), yes

Well, updating bibslurp to the new ADS amounts exactly to writing the new interface. If updating the package is left to the community, I will try to implement the new backend.

giordano commented 5 years ago

If updating the package is left to the community, I will try to implement the new backend.

That's the free software spirit :wink: I am part of the community as much as you, I found something that I could improve here and I contributed to this package.

Knusper commented 5 years ago

Well, if someone can write the interface with the official API (that is issue #10), yes

Well, updating bibslurp to the new ADS amounts exactly to writing the new interface. If updating the package is left to the community, I will try to implement the new backend.

Unfortunately I don't know enough elisp, but I think a lot of the existing code can be reused, and instead of parsing the HTML we parse the output JSON from the API.

Knusper commented 5 years ago

OK - I started to experiment a bit on how to approach the transition to the new interface.

The emacs-request library seems to be what we should be using.

After some experimentation I found the interface quite intuitive. However, accessing the API requires an authentication token, which in turn requires the users of this bibslurp-query-ads to have an ADS account.

With emacs-request we then can access the ADS API (which is documented here) like this (<token> is placeholder for the ADS token, and <query> for the actual query):

(defvar auth-token "<token>")
(setq response (request
        "https://api.adsabs.harvard.edu/v1/search/query"
        :headers
        `(("Authorization" . ,(concat "Bearer " auth-token)))
        :params
        `(("q" . "<query>") ("fl" . "author, title, bibcode"))
        :type "GET"
        :parser 'json-read)
      )

Now (request-repsonse-data response) contains all the information we need to build the bibslurp-query-ads buffer.

Using the export-service of the API we can get the bibtex for the individual entries, from their bibcodes..

I can also look into this... Given that I have ZERO previous experience in list, I feel optimistic.

P.S: Please feel free to close #10.

giordano commented 5 years ago

However, accessing the API requires an authentication token, which in turn requires the users of this bibslurp-query-ads to have an ADS account.

Is it possible to programmatically generate a token? I'm thinking about something similar to what magit forge does

Sbozzolo commented 5 years ago

A while ago I started attacking this problem (is still my intention to work of this before October). What needs to be done is straightforward, and most of the work is gymnastics with lists.

I have the feeling that considering the userbase of this package it is not unreasonable to ask for a personal ADS API token. That is, I think that the first step should be to modernize the backend. After that, we could maybe worry about the tokens.

Knusper commented 5 years ago

I have the feeling that considering the userbase of this package it is not unreasonable to ask for a personal ADS API token.

Let's see where they are going with the API. At the moment I think that is the only option we have.

... is still my intention to work of this before October.

Excellent news - good luck! If you want help with testing and bug-fixing, then create a fork / testing-branch and then I volunteer.

@mkmcc will you accept PRs and maintain the repo, or should it be forked?

giordano commented 5 years ago

I may be able to merge pull requests

Sbozzolo commented 5 years ago

I finally have some time to spend on this. I forked the repo and implemented some basic functionalities using the APIs. At the moment, it is possible to search for papers and copy the bibtex entry. It is working okay-ish, and there's certainly more work to be done to make it usable. I expect to continue working over the next weekends and hopefully we will have an acceptable replacement before ADS Classic is gone.

smaret commented 4 years ago

FYI: there's a discussion on implementing a backend for ADS (using the new API) in the biblio.el package here: https://github.com/cpitclaudel/biblio.el/issues/28 Unfortunately, I'm stuck with authentification, which isn't implemented in biblio.el yet.

Sbozzolo commented 4 years ago

An update on this. The code I have is pretty much working, even if internally it requires much polishing. What currently is not implemented is (1) additional actions on each link (say, get pdf), (2) advanced search.

@smaret I am not familiar with biblio.el, can you compare it with bibslurp?

I am asking with the following in mind. I love bibslurp and I use it extensively in my workflow. I have some ideas for extensions that could be useful (for instance, slurping multiple bibtex at one time, appending to files, checking for duplicates, et cetera). However, the package has no active maintainer. I can probably fill that role, but I am wondering if it makes sense to pour my time in this package if biblio.el offers a more polished and complete experience.

That said, I have to add that it is not difficult to complete a new version of bibslurp that does everything the old one did (except the advanced search, which I've never used and I haven't looked into yet). So, in any case, I'll release that so that we can continue enjoying this package.

giordano commented 4 years ago

However, the package has no active maintainer

I can push to this repository. I definitely don't have the time to write myself the new interface, but I'd be happy to review pull requests

Knusper commented 4 years ago

I really like the bibslurp workflow. @Sbozzolo thank you so much for working on it.

smaret commented 4 years ago

@smaret I am not familiar with biblio.el, can you compare it with bibslurp

They have a similar purpose: fetch references from a bibliographic database and insert it in BibTeX format into a buffer. The main difference is that biblio.el allows to fetch references from multiple databases (e.g. arXiV), while bibslurp works only with the ADS. biblio.el also have more functionalities, such as inserting several references in a file.

You can give it a try with e.g.

M-x arxiv-lookup "maret 2014"

vs

M-x bibslurp-query-ads "maret 2014"

With that said, I don't mean that bibslurp users should switch over to biblio.el. My comment was just a heads-up so that we can share efforts in implementing ADS support in both packages.

jdtsmith commented 4 years ago

ADS classic is now officially retired, and the 2015 vintage of bibslurp is now sadly broken. How's the port to the new API going? In the meantime, should probably put a notice up on README.

Sbozzolo commented 4 years ago

There is a barebone working implementation (https://github.com/Sbozzolo/bibslurp) that supports unstructured queries. You can search and get the bibtex files. The problem is that the way to query ADS is now fundamentally different. Before, I would simply search for an author's name and the year, and I would immediately find the result I was looking for. Now, it is pretty much impossible to find meaningful results without applying filters. At the moment, I haven't implemented a way to use filters.

I care about this package and project, and I renew my commitment. However, I had overestimated the amount of time I have, and I have to confess that this is quite a low priority project compared to others. Now, probably things will change considered the retirement of ADS classic, but I cannot give any guarantee.

Knusper commented 4 years ago

@Sbozzolo You need to add requests.el to the dependencies for your fork. I also can not run any queries with it, not even simple queries - I get the error message: REQUEST [error] Error (error) while connecting to https://api.adsabs.harvard.edu/v1/search/query. bibslurp/prepare-entry-list: Symbol’s function definition is void: seq-map-indexed ...

giordano commented 4 years ago

In the meantime, should probably put a notice up on README.

Done :slightly_smiling_face:

Knusper commented 4 years ago

OK - the first error was related to my API key entered wrong - the second is that seq-map-indexed is not part of emacs 25.3 - so in order to ensure compatibility with emacs 25:

(require 'seq)
(unless (fboundp 'seq-map-indexed)
  (defun seq-map-indexed (function sequence)
    (let ((index 0))
      (seq-map (lambda (elt)
                 (prog1
                     (funcall function elt index)
                   (setq index (1+ index))))
               sequence))))
Sbozzolo commented 4 years ago

Thank you very much, I'll commit that change later today.

Knusper writes:

OK - the first error was related to my error key - the second is that seq-map-indexed is not part of emacs 25.3 - so in order to ensure compatibility with emacs 25:

(require 'seq)
(unless (fboundp 'seq-map-indexed)
  (defun seq-map-indexed (function sequence)
    (let ((index 0))
      (seq-map (lambda (elt)
                 (prog1
                     (funcall function elt index)
                   (setq index (1+ index))))
               sequence))))
Knusper commented 4 years ago

So with this it works for me - I was only using simple queries, and querying for DOI's ... I guess more complicated use cases (incl. the advanced search) may be adressed later... Thanks so far!

jdtsmith commented 4 years ago

Glad to see some progress there. I actually spent a bit of time playing with the new API and mocked up:

(require 'request)
(request "https://api.adsabs.harvard.edu/v1/search/query"
     :headers '(("Authorization" . "Bearer x....X"))
     :params '(( "q" . "author:\"smith,j\"")
           ("fl" . "bibcode,author,title,abstract,pubdate")
           ("fq" . "database:astronomy")
           ("fq" . "property:refereed")) ;notrefereed article
     :parser 'json-read
     :success (cl-function
           (lambda (&key data &allow-other-keys)
             (with-current-buffer (get-buffer-create "*ADS BIBSLURP*")
               (erase-buffer)
               (insert (pp (assq 'response data)))))))

which works pretty well. Seems sensible to grab the abstract and just hide it until the user wants to pop it up. I also investigated the links, which are:

https://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/PUB_PDF
https://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/PUB_HTML
https://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/EPRINT_PDF
https://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/EPRINT_HTML
https://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/SIMBAD
https://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/NED

So similar but different. And you can link to the main ADS abstract page with:

https://ui.adsabs.harvard.edu/abs/THEBIBCODE

So that's worth having. Once you have a bibcode THEBIBCODE, you can grab the BibTeX ala:

(request "https://api.adsabs.harvard.edu/v1/export/bibtex"
     :type "POST"
     :data "{\"bibcode\": [\"THEBIBCODE\"]}"
     :headers '(("Authorization" . "Bearer x...X")
            ("Content-Type" . "application/json"))
     :parser 'json-read
     :success (cl-function
           (lambda (&key data &allow-other-keys)
             (with-current-buffer (get-buffer-create "*ADS BIBSLURP*")
               (erase-buffer)
               (insert (cdr (assq 'export data)))))))
jdtsmith commented 4 years ago

For an interface, I think something like transient would be phenomenal. This is the abstracted transient-pop-up interface Magit uses, if you're familiar. It includes all sorts of useful history of former commands (aka entire searches), individual arguments (e.g. authors), etc. Documentation is a bit... opaque. But if you've ever used Magit you'll understand how powerful and intuitive it is. This would be super useful for narrowing down searches, since you can perform a search, see that it's too broad, pop up BibSlurp again set one more filter & search again, etc.

And one other brain dump, while I'm thinking of it. By default the new API only returns 10 results, which is probably very speedy. But you can check the numFound key to see if it's larger than 10, and pass in (+ start numReturned) to get the next page. So perhaps a key binding to load up the next (or previous) page in the bibslurp buffer would be useful.

Sbozzolo commented 4 years ago

Glad to see some progress there. I actually spent a bit of time playing with the new API and mocked up:

(require 'request)
(request "https://api.adsabs.harvard.edu/v1/search/query"
   :headers '(("Authorization" . "Bearer x....X"))
   :params '(( "q" . "author:\"smith,j\"")
         ("fl" . "bibcode,author,title,abstract,pubdate")
         ("fq" . "database:astronomy")
         ("fq" . "property:refereed")) ;notrefereed article
   :parser 'json-read
   :success (cl-function
         (lambda (&key data &allow-other-keys)
           (with-current-buffer (get-buffer-create "*ADS BIBSLURP*")
             (erase-buffer)
             (insert (pp (assq 'response data)))))))

which works pretty well. Seems sensible to grab the abstract and just hide it until the user wants to pop it up. I also investigated the links, which are:

https://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/PUB_PDF
https://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/PUB_HTML
https://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/EPRINT_PDF
https://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/EPRINT_HTML
https://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/SIMBAD
https://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/NED

So similar but different. And you can link to the main ADS abstract page with:

https://ui.adsabs.harvard.edu/abs/THEBIBCODE

So that's worth having. Once you have a bibcode THEBIBCODE, you can grab the BibTeX ala:

(request "https://api.adsabs.harvard.edu/v1/export/bibtex"
   :type "POST"
   :data "{\"bibcode\": [\"THEBIBCODE\"]}"
   :headers '(("Authorization" . "Bearer x...X")
          ("Content-Type" . "application/json"))
   :parser 'json-read
   :success (cl-function
         (lambda (&key data &allow-other-keys)
           (with-current-buffer (get-buffer-create "*ADS BIBSLURP*")
             (erase-buffer)
             (insert (cdr (assq 'export data)))))))

This is pretty much my implementation in the aforementioned link.

For an interface, I think something like transient would be phenomenal. This is the abstracted transient-pop-up interface Magit uses, if you're familiar. It includes all sorts of useful history of former commands (aka entire searches), individual arguments (e.g. authors), etc. Documentation is a bit... opaque. But if you've ever used Magit you'll understand how powerful and intuitive it is. This would be super useful for narrowing down searches, since you can perform a search, see that it's too broad, pop up BibSlurp again set one more filter & search again, etc.

At the moment, I think that the old interface is good enough.

And one other brain dump, while I'm thinking of it. By default the new API only returns 10 results, which is probably very speedy. But you can check the numFound key to see if it's larger than 10, and pass in (+ start numReturned) to get the next page. So perhaps a key binding to load up the next (or previous) page in the bibslurp buffer would be useful.

Yes, I had the same thought. I will implement a pager to do what you describe.

Knusper commented 4 years ago

The difference here is that JD uses the callback from requests, so we don't have to block all emacs with :sync t.

On Mon, Nov 4, 2019, 21:06 Gabriele Bozzola notifications@github.com wrote:

Glad to see some progress there. I actually spent a bit of time playing with the new API and mocked up:

(require 'request) (request "https://api.adsabs.harvard.edu/v1/search/query" :headers '(("Authorization" . "Bearer x....X")) :params '(( "q" . "author:\"smith,j\"") ("fl" . "bibcode,author,title,abstract,pubdate") ("fq" . "database:astronomy") ("fq" . "property:refereed")) ;notrefereed article :parser 'json-read :success (cl-function (lambda (&key data &allow-other-keys) (with-current-buffer (get-buffer-create "ADS BIBSLURP") (erase-buffer) (insert (pp (assq 'response data)))))))

which works pretty well. Seems sensible to grab the abstract and just hide it until the user wants to pop it up. I also investigated the links, which are:

https://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/PUB_PDFhttps://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/PUB_HTMLhttps://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/EPRINT_PDFhttps://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/EPRINT_HTMLhttps://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/SIMBADhttps://ui.adsabs.harvard.edu/link_gateway/THEBIBCODE/NED

So similar but different. And you can link to the main ADS abstract page with:

https://ui.adsabs.harvard.edu/abs/THEBIBCODE

So that's worth having. Once you have a bibcode THEBIBCODE, you can grab the BibTeX ala:

(request "https://api.adsabs.harvard.edu/v1/export/bibtex" :type "POST" :data "{\"bibcode\": [\"THEBIBCODE\"]}" :headers '(("Authorization" . "Bearer x...X") ("Content-Type" . "application/json")) :parser 'json-read :success (cl-function (lambda (&key data &allow-other-keys) (with-current-buffer (get-buffer-create "ADS BIBSLURP") (erase-buffer) (insert (cdr (assq 'export data)))))))

This is pretty much my implementation in the aforementioned link.

For an interface, I think something like transient would be phenomenal. This is the abstracted transient-pop-up interface Magit uses, if you're familiar. It includes all sorts of useful history of former commands (aka entire searches), individual arguments (e.g. authors), etc. Documentation is a bit... opaque. But if you've ever used Magit you'll understand how powerful and intuitive it is. This would be super useful for narrowing down searches, since you can perform a search, see that it's too broad, pop up BibSlurp again set one more filter & search again, etc.

At the moment, I think that the old interface is good enough.

And one other brain dump, while I'm thinking of it. By default the new API only returns 10 results, which is probably very speedy. But you can check the numFound key to see if it's larger than 10, and pass in (+ start numReturned) to get the next page. So perhaps a key binding to load up the next (or previous) page in the bibslurp buffer would be useful.

Yes, I had the same thought. I will implement a pager to do what you describe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mkmcc/bibslurp/issues/12?email_source=notifications&email_token=ABD4RFS2S77JRP5NWCQZ2ETQSC2HZA5CNFSM4HJRSVS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDBESQY#issuecomment-549603651, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABD4RFVNNE6IHYYYIIRKPBLQSC2HZANCNFSM4HJRSVSQ .

jdtsmith commented 4 years ago

Sorry hadn't had a chance to look at it yet.

The difference here is that JD uses the callback from requests, so we don't have to block all emacs with :sync t.

Right. And the callback can kick off formatting the buffer, etc. BTW, the new API is rather powerful, which might be a good motivation for a more feature-rich interface. For example, this search:

(require 'request)
(let ((request--url-unreserved-chars
       (push 43 request--url-unreserved-chars))) ;The '+' character needs protecting
  (request "https://api.adsabs.harvard.edu/v1/search/query"
       :headers '(("Authorization" . "Bearer x...X"))
       :params '(( "q" . "abs:quasar year:1990- citation_count:1000-")
             ("sort" . "citation_count+desc")
             ("fl" . "bibcode,first_author,title,pubdate,citation_count,read_count")
             ("fq" . "database:astronomy")
             ("fq" . "property:refereed")) ;
       :parser 'json-read
       :success (cl-function
             (lambda (&key data &allow-other-keys)
               (with-current-buffer (get-buffer-create "*ADS BIBSLURP*")
             (erase-buffer)
             (insert (pp (assq 'response data))))))
       :error (cl-function
           (lambda (&rest args &key error-thrown &allow-other-keys)
             (message "Got error: %S" error-thrown)))))

delivers the top 10 quasar-related papers since 1990 sorted by descending citation count. Note that I found that request translates the + character but ADS wants it raw, so I had to add it to the "unreserved" list.

Sbozzolo commented 4 years ago

BTW, the new API is rather powerful, which might be a good motivation for a more feature-rich interface.

I certainly agree with that, and I picture in my head a nice package with transient, as you suggested. My previous comment was probably to be read as "I don't think this is the priority at the moment". I think the top priority is to restore most of the functionalities and remove legacy code.

The difference here is that JD uses the callback from requests, so we don't have to block all emacs with :sync t.

Right. And the callback can kick off formatting the buffer, etc

Yes, you are right

Knusper commented 4 years ago

Note that I found that request translates the + character but ADS wants it raw, so I had to add it to the "unreserved" list.

That is interesting, I the same holds for :, right now queries containing a column with request appear to fail for me in the @Sbozzolo fork, which I use on a daily basis.

Knusper commented 3 years ago

So somehow the @Sbozzolo fork now also stopped working for me - slurping some bibtex from ADS fails with the following:

Debugger entered--Lisp error: (wrong-type-argument stringp nil)
  string-match("@\\sw+{\\([^,]+\\)," nil)
  (progn (string-match "@\\sw+{\\([^,]+\\)," bibtex) (replace-match new-label t t bibtex 1))
  (progn (progn (string-match "@\\sw+{\\([^,]+\\)," bibtex) (replace-match new-label t t bibtex 1)))
  (if (not (string-equal new-label "")) (progn (progn (string-match "@\\sw+{\\([^,]+\\)," bibtex) (replace-match new-label t t bibtex 1))))
  (let ((bibtex (bibslurp/request-bibtex bibcode))) (if (not (string-equal new-label "")) (progn (progn (string-match "@\\sw+{\\([^,]+\\)," bibtex) (replace-match new-label t t bibtex 1)))))
  (if (not (equal bibslurp-bibtex-label-format 'author-year)) (bibslurp/request-bibtex bibcode) (let ((bibtex (bibslurp/request-bibtex bibcode))) (if (not (string-equal new-label "")) (progn (progn (string-match "@\\sw+{\\([^,]+\\)," bibtex) (replace-match new-label t t bibtex 1))))))
  bibslurp/bibcode-to-bibtex("2002AJ....124..266P" "Peng2002")
  (kill-new (bibslurp/bibcode-to-bibtex bibcode (bibslurp/suggest-label authors date)))
  (let ((bibcode (get-text-property (point) 'bibcode)) (authors (get-text-property (point) 'authors)) (date (get-text-property (point) 'date))) (kill-new (bibslurp/bibcode-to-bibtex bibcode (bibslurp/suggest-label authors date))) (message "Saved bibtex entry to kill-ring."))
  bibslurp-slurp-bibtex()
  funcall-interactively(bibslurp-slurp-bibtex)
  call-interactively(bibslurp-slurp-bibtex nil nil)
  command-execute(bibslurp-slurp-bibtex)
Sbozzolo commented 3 years ago

Yes, I noticed that too. I haven't found the time to fix it yet. I think that ADS updated the APIs to accept the payload in only a specific way (I didn't see any news about this). I think I have already pinpointed the issue with curl and I have to implement the change in elisp using the request module.

Knusper commented 3 years ago

I was digging through this as well, and I also couldn't find any update on the ADS site regarding the change. Very strange.

On Thu, May 6, 2021, 14:19 Gabriele Bozzola @.***> wrote:

Yes, I noticed that too. I haven't found the time to fix it yet. I think that ADS updated the APIs to accept the payload in only a specific way (I didn't see any news about this). I think I have already pinpointed the issue with curl and I have to implement the change in elisp using the request module.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mkmcc/bibslurp/issues/12#issuecomment-833752772, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABD4RFS65WER7JVIDQ6VKHDTMLMSNANCNFSM4HJRSVSQ .

Sbozzolo commented 3 years ago

The issue is the following. What we do at the moment is more or less (if I remember correctly)

curl -d "bibcode=2010PhRvD..82j4014W" -H 'Authorization: Bearer TOKEN' 'https://api.adsabs.harvard.edu/v1/export/bibtex'

What works is

curl -d '{"bibcode":"2010PhRvD..82j4014W"}' -H 'Authorization: Bearer TOKEN' 'https://api.adsabs.harvard.edu/v1/export/bibtex'

So, we need to change the way we pass the data. It shouldn't be too difficult, but I have to understand how request.el wants me to do it.

jdtsmith commented 3 years ago

My use of request above still works fine (:headers keyword). BTW, transient has now been moved into emacs itself, so everyone will have access. I continue to think exposing more of this API is the perfect use of transient.

Sbozzolo commented 3 years ago

My use of request above still works fine (:headers keyword).

The specific issue is for retrieving the bibtex, does the request work for that?

And yes, it would be very nice to with transient.

Sbozzolo commented 3 years ago

I fixed it in the latest commit. It should be working now.