Open spurioso opened 11 years ago
Steve,
I think you could pretty easily write a python script to loop through a list of ISBNs and query LOC's catalog API for each one. Then, you could use the 're' (regular expressions) python module to parse the resulting records and locate a particular field or string.
The question is, is there a single field that you could expect to find in each record that would be useful for sorting the journals in question? Subject fields are the sort of thing you're looking for, I think, but the question is how do you deal with results that return multiple subject fields for a given record? Maybe some MARC specialists could point us in the right direction. Is there a single field we could look for in the records retrievable from LOC's catalog that would allow us to sort journals into meaningful sub-categories?
Here's more about LOC's API: http://www.loc.gov/standards/sru/
A sample query is: http://z3950.loc.gov:7090/voyager?version=1.1&operation=searchRetrieve&query=0596002815&maximumRecords=1&recordSchema=dc
The bit after "&query=" is where you could put in the ISBN/ISSN. This query asks for one result only, and that the result be Dublin Core metadata. I put in the ISBN for Mark Lutz's Learning Python book.
I have a bit of Python that you can use to query a website. It requires a module called url, but it is pretty easy to work with. I will post it on Github and put a link here.
Josh
Because of their specificity, LC subject headings are not particularly useful for categorizing things. LC Classification numbers are better (you would have to translate them into words), but I don't see one in the DC output for Josh's record below.
Linda
From: Joshua Westgard [notifications@github.com] Sent: Wednesday, August 28, 2013 4:16 PM To: umd-coding-workshop/website Subject: Re: [website] Find subject categories for a list of ISSNs and add them to a spreadsheet (#22)
Steve,
I think you could pretty easily write a python script to loop through a list of ISBNs and query LOC's catalog API for each one. Then, you could use the 're' (regular expressions) python module to parse the resulting records and locate a particular field or string.
The question is, is there a single field that you could expect to find in each record that would be useful for sorting the journals in question? Subject fields are the sort of thing you're looking for, I think, but the question is how do you deal with results that return multiple subject fields for a given record? Maybe some MARC specialists could point us in the right direction. Is there a single field we could look for in the records retrievable from LOC's catalog that would allow us to sort journals into meaningful sub-categories?
Here's more about LOC's API: http://www.loc.gov/standards/sru/
A sample query is: http://z3950.loc.gov:7090/voyager?version=1.1&operation=searchRetrieve&query=0596002815&maximumRecords=1&recordSchema=dc
The bit after "&query=" is where you could put in the ISBN/ISSN. This query asks for one result only, and that the result be Dublin Core metadata. I put in the ISBN for Mark Lutz's Learning Python book.
I have a bit of Python that you can use to query a website. It requires a module called url, but it is pretty easy to work with. I will post it on Github and put a link here.
Josh
— Reply to this email directly or view it on GitHubhttps://github.com/umd-coding-workshop/website/issues/22#issuecomment-23444008.
Linda, maybe this is better?
I only changed the very end of the query string, from 'dc' to 'marcxml'.
There might be another option that's better. For a full list of the schemata available via this service, see: http://www.loc.gov/standards/sru/resources/schemas.html
Yes. Datafield tag "050", subfield code "a" is the classification number.
Cool. Thanks, Linda! So I think it would be pretty easy to take the list of ISBNs, query the LOC for the marcxml record for each one, and use regular expressions to pull out contents of the 050a field from each record, and write it into a spreadsheet next to the ISSN. The question is, Steve, is that the sort of thing you were after?
Thanks, Josh and Linda. Yes, this is more or less what I'm after. Deciding the data source will be the key I think. For music, LC Class won't work well. Most of the music journals fall into ML1 (for music journals published in the U.S.) or ML5 (for journal published elsewhere) so there isn't much granularity there. A few of them fall into specific classifications, like ML410 for journals devoted to a single composer. Still, just within ML5 you might have a journal on Medieval music and one on 20th century music theory. LC Class might work better for other disciplines, though.
As Linda mentioned, LCSH would probably be problematic too, because they're so specific. However, we could try it by looking for 650 fields and then the word "Periodicals," which is used a subdivision. Worth trying.
I was hoping that either EBSCOnet or Ulrich's would have it's own taxonomy but I just checked both and they seem to use LC Class and Dewey. I don't know if Dewey would be useful.
Hmm...
If you’re going to get a report from Aleph for the ISSNs, then you could get the classification numbers in the same report. Then your challenge would be writing code that would convert the class numbers to their subject categories.
Linda
From: Steve Henry [mailto:notifications@github.com] Sent: Thursday, November 07, 2013 9:26 AM To: umd-coding-workshop/website Cc: Linda Seguin Subject: Re: [website] Find subject categories for a list of ISSNs and add them to a spreadsheet (#22)
— Reply to this email directly or view it on GitHubhttps://github.com/umd-coding-workshop/website/issues/22#issuecomment-27968464.
Thanks, @lseguin!
I wonder if LC's Linked Data service might be useful:
Here's their record for books or journals about jazz: http://id.loc.gov/authorities/classification/ML3505.8-ML3509.html
This project now has a repo! https://github.com/umd-coding-workshop/journal-review
This is something that came up in a meeting of UMD selectors and collection management people last week.
The Libraries will be doing a survey with faculty to determine how important our serials subscriptions are.
Faculty will be asked to look at a list of our subscriptions and rank them in some way (i.e., Can NEVER cancel, could do without, etc.).
We'll be using some kind of tool developed by NC State for this. I can add the link when I find it.
The lists of subscriptions will be sorted by fund code only. So, all the music journals will be listed on one spreadsheet. This includes academic musicology journals, trade magazines for instrumentalists, music education journals, and so on.
At the meeting, some selectors expressed a desire to further subdivide the lists. So, for me, maybe I would want to separate out the music education journals from the music theory journals, since a music theorist might not be interested at all in music education. The situation is maybe even more acute for something like English literature that has several hundred subscriptions.
So, I'm wondering if there's a way to take the list of ISSNs that will be in the spreadsheet, feed them through something and have the something spit back out useful subject categories. I know this is possible with Aleph or Worldcat. I'm also wondering if Ulrich's offers a Web service that might do something similar, maybe with simpler subcategories. Maybe some other options out there too.