unt-libraries / pycallnumber

Parse, model, and manipulate any type of call number string.
BSD 3-Clause "New" or "Revised" License
64 stars 9 forks source link

Is there a way to classify call numbers without cutters as LC? #30

Closed misilot closed 4 years ago

misilot commented 4 years ago

Hello,

Is there a way to classify call numbers without cutters as LC, so I can access the classification function?

B355 1899
B355 1927a
B355 1960

These all return Local instead of LC.

Thank you! Tom

jthomale commented 4 years ago

Hi Tom,

There are a few ways you could do this. The clearest might be to create a custom call number type subclassed from pycallnumber.units.LC that doesn't use a cutter, and then add that to the list of call number types that you want to detect using e.g. pycallnumber.callnumber when you call it.

Your custom class might look something like this, where you slice the groups from the main LC class to exclude the cutter and preceding period.

import pycallnumber as pycn

class LCNoCutter(pycn.units.LC):
    lc_groups = pycn.units.LC.template.groups
    definition = 'an LC call number without a Cutter'
    template = pycn.template.CompoundTemplate(
        separator_type=pycn.units.simple.DEFAULT_SEPARATOR_TYPE,
        groups=lc_groups[0:1] + lc_groups[3:]
    )

Then, depending on what you're expecting in your data, you can create a list of types including LCNoCutter to pass via the optional unittypes kwarg when you call the various factory functions, like pycallnumber.callnumber. (Such as what's described here in the README.) As long as LCNoCutter appears before pycallnumber.units.Local, those call numbers should match that type instead of Local.

For instance, continuing the above code:

# Assume something is a valid LC call number, otherwise maybe an LC
# without the cutter, or treat it as a local call number if all else
# fails:
unittypes = [pycn.units.LC, LCNoCutter, pycn.units.Local]
pycn.callnumber('B355 1899', unittypes=unittypes)
pycn.callnumber('B355 1927a', unittypes=unittypes)
pycn.callnumber('B355 1960', unittypes=unittypes)

The type of the object you'll get back should be LCNoCutter, but it will behave like the LC type, just without the Cutter.

I hope that helps!

misilot commented 4 years ago

Thank you this worked great! Now on to figuring out how to sort values.

misilot commented 4 years ago

Is it possible to improve sorting for these call numbers? As these all seem to be being sorted above all the LC values.

Thanks!

misilot commented 4 years ago

Hi @ jthomale it looks like sorting splits it into the different classifications instead of trying to sort everything separately? For example,

Type: <class 'pycallnumber.units.callnumbers.local.Local'> CN: E725.45 1st .W35 1998
Type: <class 'pycallnumber.units.callnumbers.local.Local'> CN: E725.45 10th .U53 1993
Type: <class 'pycallnumber.units.callnumbers.local.Local'> CN: E748v.T2 W5
Type: <class 'pycallnumber.units.callnumbers.local.Local'> CN: E806 .H67a
Type: <class 'pycallnumber.units.callnumbers.lc.LC'> CN: E11 .C691 no.1-2
Type: <class 'pycallnumber.units.callnumbers.lc.LC'> CN: E11 .C691 no.8-11 no.9-10
Type: <class 'pycallnumber.units.callnumbers.lc.LC'> CN: E11 .C691 no.8-11 no.11

Also, is there a way to have the class name show up in the following string instead of Local no matter which custom class it matches on? <class 'pycallnumber.units.callnumbers.local.Local>

Thanks!

jthomale commented 4 years ago

Hi Tom,

It looks like there are a couple of things that may be going on here.

First, make sure you're including your custom unittypes list in every call to pycallnumber.callnumber. When I started testing out the example list in your last comment, I made the mistake of leaving it out and I got the exact same results you did. However, when I do include it and generate a sorted list, I get better results. (Still not perfect and I'll talk about why in a minute.)

Assuming I'm just continuing the code from my earlier comment:

cn_strings = [
    'E725.45 1st .W35 1998',
    'E725.45 10th .U53 1993',
    'E748v.T2 W5',
    'E806 .H67a',
    'E11 .C691 no.1-2',
    'E11 .C691 no.8-11 no.9-10',
    'E11 .C691 no.8-11 no.11'
]
cn_objs = [pycn.callnumber(cn, unittypes=unittypes) for cn in cns]

Then sorted(cn_objs) yields:

[
    <Local 'E748v.T2 W5'>,
    <LC 'E11 .C691 no.1-2'>,
    <LC 'E11 .C691 no.8-11 no.9-10'>,
    <LC 'E11 .C691 no.8-11 no.11'>,
    <LCNoCutter 'E725.45 1st .W35 1998'>,
    <LCNoCutter 'E725.45 10th .U53 1993'>,
    <LCNoCutter 'E806 .H67a'>
]

So, it's an improvement but there are still a couple of oddities caused by irregularities in the data.

For the E748 and E725 call numbers, they make me think of what sometimes happens when LC call numbers are formatted into columns for spine labels and then they're recombined.

Also, yes—different types of call numbers when sorted in the same list will have a tendency to group together if the different call number types have different rules for how their sort keys are generated. If you're curious to see exactly why things are sorting the way they are, you can call the for_sort method directly to see the sort key. Example, again continuing from above:

for cn in sorted(cn_objs):
    print cn.for_sort()

yields:

e!0000000748!v!t!0000000002!w!0000000005
e!0011!c!691!!0000000001!0000000002
e!0011!c!691!!0000000008!0000000011!!0000000009!0000000010
e!0011!c!691!!0000000008!0000000011!!0000000011
e!0725.45!!0000000001!st!!w!0000000035!!0000001998
e!0725.45!!0000000010!th!!u!0000000053!!0000001993
e!0806!!h!0000000067!a

E.g., in this case, for Local call numbers, integers default to using a 10-digit zero-padded sort number, while LC classes are only 4 digits.

misilot commented 4 years ago

Thank you! This helped a lot. I happened to not be including the unittypes when I was sorting the array of call numbers.

Thanks again!