wellcomecollection / platform

Wellcome Collection Digital Platform
https://developers.wellcomecollection.org/
MIT License
47 stars 10 forks source link

Investigate the use of EPB/ numbers for shelfmarks #5203

Open alexwlchan opened 3 years ago

alexwlchan commented 3 years ago

This is a piece spun out of https://github.com/wellcomecollection/platform/issues/5147

Alexandra has pointed to early printed books, with shelfmarks that start with 'EPB/', as the next place to look at for cleaning up the shelfmarks. These are ~12% of the shelfmarks currently in the API.

I've got a report on how these shelfmarks are used that I need to read, then Alex Hill is the go-to person for this.

alexwlchan commented 3 years ago

Alex and I had a call this afternoon to discuss what's going on here.

Unsurprisingly, this problem has fractal complexity.


Here are my notes:

Here's what I think could work:

amme2 commented 3 years ago
  • There are older locations that we should also make searchable (but not put on the page). These former locations are either in 949 ind1 0 ind2 0 or in field tag t. I need to look at how we'd find these in more detail.

This is true, but I don't think the older shelfmarks should be an immediate priority. And most of these 'former locations' relate in some way to the current shelfmarks anyway, so work on making the current shelfmarks searchable should also help you along with bringing in former shelfmarks at some later date.

  • We copy both the former and current EPB/ and EPH/ numbers into the identifiers block on the item, so they're searchable.

Personally I think I would want to tackle EPB/ first, not least because the linkages are different and potentially much more complicated for EPH/. (Also the EPH/ (ephemera) structure might suit more of an archives style display format over the longer term, so my instinct is to suggest the issues with bringing in these as identifiers will be different between EPB/ and EPH/. Although I concede there's an obvious search similarity/potential for confusion which needs to be taken into account from the beginning.)

I'm not sure what the identifier type should be; maybe early-printed-book-number and ephemera-number.

I think - having looked at the API and found an equivalent 'iconographic-number' somewhat to my horror - this might warrant a longer conversation... I'm not convinced there is much benefit (but there are certainly downsides) to continuing to use these rather arbitrary categories, and I am desperately trying to move away from them in our collections information applications. If they are all surfaced on wellcomecollection.org as Reference Number, (why) do we need to further define them?

jtweed commented 3 years ago

On that last point, if the distinction isn't as useful as we thought, in the identifiers list it might make sense to go with William's suggestion of call number (we do get those from 001...)

alexwlchan commented 3 years ago

There are older locations that we should also make searchable (but not put on the page). These former locations are either in 949 ind1 0 ind2 0 or in field tag t. I need to look at how we'd find these in more detail.

This is true, but I don't think the older shelfmarks should be an immediate priority. And most of these 'former locations' relate in some way to the current shelfmarks anyway, so work on making the current shelfmarks searchable should also help you along with bringing in former shelfmarks at some later date.

True, although once we've agreed an approach for the current shelfmarks, the older locations should fall out pretty easily.

We copy both the former and current EPB/ and EPH/ numbers into the identifiers block on the item, so they're searchable.

Personally I think I would want to tackle EPB/ first, not least because the linkages are different and potentially much more complicated for EPH/. (Also the EPH/ (ephemera) structure might suit more of an archives style display format over the longer term, so my instinct is to suggest the issues with bringing in these as identifiers will be different between EPB/ and EPH/.

👍

If they are all surfaced on wellcomecollection.org as Reference Number, (why) do we need to further define them?

Not all the identifiers in this list get surfaced as reference number. We do this for the numbers-that-end-in-i (although even that might need revisiting, see https://github.com/wellcomecollection/platform/issues/5205), but there are identifiers in this list that we don't display on the page. I expect to add more as we continue to work through the shelfmark data, and we need some way to describe them.

There is the "wellcome-library-reference" bucket that we could throw them all into, but treating them as an amorphous soup of strings doesn't feel like a great approach.

I do feel it's useful to distinguish identifiers based on their original context, in the same way we describe "Miro image numbers" in the API even though we'd never put the word "Miro" on the page. Especially as we have overlapping identifier schemes, e.g. "171i" has been used as both a reference number and a manuscript identifier.

amme2 commented 3 years ago

I do feel it's useful to distinguish identifiers based on their original context, in the same way we describe "Miro image numbers" in the API even though we'd never put the word "Miro" on the page. Especially as we have overlapping identifier schemes, e.g. "171i" has been used as both a reference number and a manuscript identifier.

But that is a good example of where distinguishing between types of reference number is unhelpful: it encourages the allocation of additional reference numbers to fit format. That manuscript already had a reference number as a Chinese manuscript (currently in a print catalogue only, you won't find it in Sierra); it didn't need another to fit a different schema/format silo. Now it looks like there are two items, when in fact there's only one.

However, I'd reluctantly settle for Jonathan's compromise of more neutral language to describe the 'iconographic number'.