htrc / htrc-feature-reader

Tools for working with HTRC Feature Extraction files
37 stars 12 forks source link

Include convenience function for displaying readable htids in IPython notebooks. #28

Closed bmschmidt closed 5 years ago

bmschmidt commented 5 years ago

Forgive me if I've said this before, but a minor note. I wonder if it would be useful as part of the metadata fetching to include helper functions to display HT books inside an Ipython notebook.

I end up copying and pasting the following cell into different notebooks to get a clickable link to a book with minimal metadata; it doesn't make sense as a function in any library I use, but might be useful here.

Don't know if it still works with the current api.

import urllib2
import ujson as json
from IPython.display import HTML

#hathi_cache = {}

def jsonify(id, force = False):
    global hathi_cache
    if id in hathi_cache and not force:
        return hathi_cache[id]
    sons = "\n".join(urllib2.urlopen("http://catalog.hathitrust.org/api/volumes/brief/htid/%s.json" %id.replace("+",":").replace("=","/")).readlines())
    hathi_cache[id] = json.loads(sons)
    return hathi_cache[id]

def descend(record):
    # Parse a hathi API call response.
    a = record['records']
    return a[a.keys()[0]]

def pretty_print(htid,text):
    output_string = ""#u"<ul>"
    try:
        a = descend(jsonify(htid))
        a['url'] = u"https://babel.hathitrust.org/cgi/pt?id=" + htid
        try:
            output_string += u"<li><a href={}>{} ({})</a><br>{}</li>".format(
                a['url'],a['titles'][0].encode("ascii","ignore"),a['publishDates'][0],text.encode("ascii","ignore"))
        except:
            print a
    except IndexError:
        print ('no index',p)
        pass
    except:
        print ""
        raise
    return HTML(output_string + "")#)"</ul>")

class Printable_Hathi():
    def __init__(self,htid,text):
        self.htid = htid
        self.desc = descend(jsonify(htid))
        self.text = text
    def _repr_html_(self):
        self.desc['url'] = u"https://babel.hathitrust.org/cgi/pt?id=" + self.htid
        output_string = u"<li><a href={}>{} ({})</a><br>{}</li>".format(
                self.desc['url'],self.desc['titles'][0].encode("ascii","ignore"),self.desc['publishDates'][0],self.text.decode("utf-8","ignore"))
        return output_string
    def title(self):
        return self.desc['title']

Printable_Hathi("inu.30000026383574","Some sample text to go with, ❤")
organisciak commented 5 years ago

That's a great idea! Adapting it now.

Though I have the HT Bib API parsing in the newest updates to vol.metadata, I'd rather not make the extra call. Rather, I'll try to use the data that's already in the EF file.

organisciak commented 5 years ago

How's something like this as an output:

image

organisciak commented 5 years ago

I should include the id there for convenience, but it looks inelegant any way I try. Trying wrapped in code tag:

image

organisciak commented 5 years ago

With version 1.96, now on pypi and conda, the above is implemented.

Closing this ticket, but I would gladly re-open if the format needs more discussion.