scriptotek / mc2skos

Command line script for converting Marc21 Classification and Authority records to SKOS/RDF
The Unlicense
21 stars 4 forks source link

Handle cases when subfield 4 is missing #69

Closed CaptSolo closed 3 years ago

CaptSolo commented 4 years ago

If subfield $4 is missing when mapping 5XX record relations the program crashes with an AttributeError (when sf_4 is None).

This pull request gets rid of this AttributeError by checking for None.

See also: https://github.com/scriptotek/mc2skos/issues/68

danmichaelo commented 4 years ago

Thanks! It would be good to have this covered by tests too. Is it ok that I add the test record from https://github.com/scriptotek/mc2skos/issues/68#issuecomment-630187466 ?

Speaking of, I was not able to convert that file on first attempt. I got "Could not find classification scheme or subject vocabulary code", since it has 008[11]="n" (Not applicable) and also no vocabulary code in 040 $f. How do you see from the authory record which vocabulary it's part of, or is that not possible?

CaptSolo commented 4 years ago

Yes, you are welcome to use the test record from #68.

Initially, I was getting the same error as you ("Could not find classification scheme or subject vocabulary code"). I worked around it by:

  1. adding a new scheme (nll) to vocabularies.yml
  2. supplying the scheme name via a command line parameter: mc2skos --scheme nll 150_record-2.xml

Could you suggest how we should change our MARC records in order for mc2skos to find the scheme automatically?

Note: the vocabulary URI added to vocabularies.yml is a test / work-in-progress value and may change in the future.

CaptSolo commented 4 years ago

How do you see from the authory record which vocabulary it's part of, or is that not possible?

I am not a MARC expert but here's information that I got from colleagues:

danmichaelo commented 4 years ago

Thanks for the update. Best practice (afaik) is to add a vocabulary code to 040 $f (in combination with 008[11]="z"), since, in general, a single organization can produce multiple vocabularies/authority files. The 040 $f vocabulary code links a concept (or authority record) to a vocabulary (or authority file).

The code should also be listed at https://www.loc.gov/standards/sourcelist/subject.html . To have a new vocabulary added to that list, you can contact NMDSO.

Btw. there's some example MARC records from different vocabularies at https://github.com/scriptotek/mc2skos/tree/master/examples

CaptSolo commented 4 years ago

Looks like I made a mistake of making this PR from the master branch of my fork. As a result subsequent commits to the master branch unintentionally get included in this PR. I propose to close this PR.

The important part of this PR that addresses #69 is in https://github.com/scriptotek/mc2skos/pull/69/commits/8b0b51557f8bea75874b798785c642083e1f14eb#diff-a8ebb6cae69594050b537bd50ca451ebR691

Please let me know if I should create a new PR with just this change or if you can take it from here without a need for a new PR.

CaptSolo commented 4 years ago

Thanks for the update. Best practice (afaik) is to add a vocabulary code to 040 $f (in combination with 008[11]="z"), since, in general, a single organization can produce multiple vocabularies/authority files. The 040 $f vocabulary code links a concept (or authority record) to a vocabulary (or authority file).

Thank you! :)

I'll advise colleagues that we need to get a code for our authority file and add it to 040 $f. That might take some time. In the meantime, we'll have to rely on the workaround mentioned in earlier comments on this PR.

danmichaelo commented 3 years ago

Thanks! I cherry-picked 810148900b6a44f0c8eeb4f860bf8ffad55fcd63 , and also published a new release. Sorry it took so long!