Closed rburke2233 closed 6 years ago
Here's a solution using an endpoint we already have. I won't have time for a few days to package it and write tests.
EF_CHECK_URL= "https://data.analytics.hathitrust.org/features/get?ids={}"
def files_available(ids):
url = EF_CHECK_URL.format(",".join(ids))
results = pd.read_json(url, orient='index', typ='series', convert_dates=False)
return results[ids].tolist()
Using it:
ids = ['fakeid', 'mdp.39015033487193', 'fakeid2', 'hvd.hn3fhi', 'hvd.hxdan3']
files_available(ids)
[False, True, False, True, True]
If you use this before I add it to the library, you'll have to import pandas as pd
.
I would like to have a feature whereby one can check if a given Hathi ID is part of the EF dataset or not. As far as I can tell, the only way to do this now is to try to download (via rsync) and see if you are successful.
An API call something like:
file_available(ids)
Returning a list of true or false values accordingly.