zepheira / librarylink_collections

Library.Link Collections
4 stars 14 forks source link

Cull lists by low holdings count and by language group #5

Open uogbuji opened 5 years ago

uogbuji commented 5 years ago

Some of the lists are unwieldy. One way to constrain them is to omit ISBNs which are uncommon in the Library.Link network. One can also constrain them, chauvinistically, by ISBN group (i.e. language or nation).

Add this capability.

uogbuji commented 5 years ago

I've added to the Library.Link library a tool that can achieve this, combined with jq. The following command outputs JSON which annotates each ISBN found in a list with its holdings count & ISBN group.

liblinklist iteminfo --holdings --isbn-groups https://raw.githubusercontent.com/zepheira/librarylink_collections/master/lists/libraryreads.json > isbninfo.json

The resulting isbninfo.json looks something like (just the top section):

{
  "9780399590504": {
    "holdings_count": 732,
    "isbn_group": 0
  },
  "9781616201340": {
    "holdings_count": 753,
    "isbn_group": 1
  },
  "9780316556347": {
    "holdings_count": 535,
    "isbn_group": 0
  },

Now you can use jq to do the rest. First git clone this repo to get the actual list for local processing. Then to get only items with more than 500 holdings:

$ jq --arg mincount 500 --slurpfile ii isbninfo.json '.isbns=(.isbns | map(select($ii[0][.].holdings_count > ($mincount|tonumber))) )' libraryreads.json
{
  "label": "LibraryReads Favorite of Favorites 2018",
  "description": "A list of the top 10 favorites picked by librarians through LibraryReads in 2018. LibraryReads is the monthly nationwide library staff picks list for adult fiction and non-fiction.",
  "isbns": [
    "9780399590504",
    "9781616201340",
    "9780316556347",
    "9781501156212",
    "9780312577230",
    "9780735213180",
    "9780062678416"
  ]
}

Let me break down this jq command because it took me a few swings at it.

--arg mincount 500: parameter to make it easy to tweak the threshold

--slurpfile ii isbninfo.json: pull in the isbninfo.json file, made available as $ii

'.isbns=(.isbns … )': copy the input to output, changing just the isbns array

| map( … ): work over each item in the array

select( … ): keep only the array items that match this criteria

$ii[0][.].holdings_count …: look up the current array item(just an ISBN string) against the slurped in isbninfo.json. [0] is needed because --slurpfile assumes more than a single JSON could be parsed from the file, and you have to select the first one. [.] just uses the current value as a key (the ISBN string). so that .holdings_count can be looked up.

… < ($mincount|tonumber)): compares against the parameter defined earlier, but must convert to number because --arg always sets to string.

Whew! OK extending this to get only items with more than 500 holdings and in group 0 or 1 (english):

jq --arg mincount 500 --slurpfile ii isbninfo.json '.isbns=(.isbns | map(select(($ii[0][.].holdings_count > ($mincount|tonumber) and ($ii[0][.].isbn_group == 0 or $ii[0][.].isbn_group == 1)))) )' libraryreads.json

Same result in this case, since all are in group 0 or 1.