mlibrary / traject_umich_format

Extract format/type information from MARC records, as used at the University of Michigan's University Library
MIT License
4 stars 2 forks source link

too generous DVD detection? #4

Open jrochkind opened 11 years ago

jrochkind commented 11 years ago

This line:

https://github.com/billdueber/traject_umich_format/blob/master/lib/traject/umich_format/bib_types.rb#L101

Marks something as "VD", "Video (DVD)" if it contains the string "dvd" (case insensitive) in a 538.

I have a record that has "DVD-ROM" in the 538. It is a software product, containing archival serials content, and is not something most people would consider as "Video", although it may be accurate to call it a "DVD".

I don't know if this is or should be considered a bug or not.

Maybe I should just fork my own format stuff using yours as a very useful starting point? On the other hand, the benefit of sharing is that I notice edge cases like this and share them with you, who may want to fix them too.

Let me know what you think. If I keep not hearing from you, I'll assume that means 'fork'.

jrochkind commented 11 years ago

Bah, now I can't find my test case. Most DVD-ROM's do not seem to be labelled as 'VD', but I did have one that did. What complicated logic this stuff is to go through.

jrochkind commented 11 years ago

ah, okay, I still have my solr index built with this gem, so I can find em!

I have 316 items with the string "DVD-ROM" that end up given type "VD". Which is about 50% of all my records that include the string "DVD-ROM" (I can't say for sure how many of them really are DVD-ROMs!). So okay, seems worth fixing.

jrochkind commented 11 years ago

Sorry, didn't mean to close.

http://mirlyn.lib.umich.edu/Search/Home?type%5B%5D=all&lookfor%5B%5D=%22DVD-ROM%22&filter%5B%5D=format%3AVideo%20%28DVD%29

Easy enough to do the same search in Mirlyn -- SOME of those records that say "DVD-ROM" really ARE "Video" -- others are not. Hopefully the video ones will still be caught by the other fixed-field based indexing. Hey, look at that, I can add ".marc" to the end of a Mirlyn specific record URL and get MARC, woot! I'll add a randomly selected Mirlyn "actually Video DVD-ROM" as a test case and write a test for it, and hopefully it'll still be "VD".

billdueber commented 11 years ago

For readability you can also add '.xml' and get marc-xml. Or .json to get marc-in-json.

On Wed, Nov 13, 2013 at 12:43 PM, Jonathan Rochkind < notifications@github.com> wrote:

Sorry, didn't mean to close.

http://mirlyn.lib.umich.edu/Search/Home?type%5B%5D=all&lookfor%5B%5D=%22DVD-ROM%22&filter%5B%5D=format%3AVideo%20%28DVD%29

Easy enough to do the same search in Mirlyn -- SOME of those records that say "DVD-ROM" really ARE "Video" -- others are not. Hopefully the video ones will still be caught by the other fixed-field based indexing. Hey, look at that, I can add ".marc" to the end of a Mirlyn specific record URL and get MARC, woot! I'll add a randomly selected Mirlyn "actually Video DVD-ROM" as a test case and write a test for it, and hopefully it'll still be "VD".

— Reply to this email directly or view it on GitHubhttps://github.com/billdueber/traject_umich_format/issues/4#issuecomment-28416329 .

Bill Dueber Library Systems Programmer University of Michigan Library