jiaola / marc4js

A Node.js API for handling MARC
Apache License 2.0
38 stars 7 forks source link

Parse MARC 21 records? #2

Closed philgooch closed 8 years ago

philgooch commented 8 years ago

Hi there, most MARC records I come across look like this:

LEADER 00000cam a2200361 i 4500 
001    GR111154 
008    780609s1975    nyuaf    b    00110 eng u 
010    78102239 
020    0156309351 :|c$3.75 
035    (CaOTULAS)154869186 
035    (UtOrBLW)b10796848 
041 1  eng|hrus 
082 0  791.45/01 
090    PN1995|b.E52 1975 
100 1  Eisenstein, Sergei,|d1898-1948 
245 14 The film sense /|cby Sergei M. Eisenstein ; translated and
       edited by Jay Leyda 
250    [Rev. ed.] 
260    New York :|bHarcourt Brace Jovanovich,|cc1975 
300    x, 288 p., [2] leaves of plates :|bill. ;|c21 cm 
336    text|btxt|2rdacontent 
337    unmediated|bn|2rdamedia 
338    volume|bnc|2rdacarrier 
500    Includes index 
504    "Bibliography of Eisenstein's writings availabe in 
       English": p. 269-276 
650  0 Motion pictures|xAesthetics 
700 1  Leyda, Jay,|d1910-1988 
900    unlv|lmain 
910    rdae 
949 0  UNLM|p31147002913863 
989    unlv*PN1995 .E52 1975 
999    UNLM 

Do you have a parser that can handle this format? Thanks!

jiaola commented 8 years ago

Hi,

There isn't a parser in marc4js for this format by default. But it's very similar to the format that marc4js can parse. See this one for example. https://github.com/jiaola/marc4js/blob/master/test/data/PGA_2records.txt At glimpse, the differences I notice are "LEADER" vs "LDR", and "|" vs "$" for subfields. I think with small tweak of the TextParser https://github.com/jiaola/marc4js/blob/master/lib/parse/text_parser.js, marc4js would be able to parse your format.

philgooch commented 8 years ago

Thanks very much for your speedy response! This makes sense, I'll create a new parser based on your text_parser.js as you suggest.

I think the format is MARC21 or UNIMARC

http://www.loc.gov/marc/bibliographic/ http://www.ifla.org/publications/unimarc-formats-and-related-documentation

Example:

http://webpac.library.unlv.edu/search~S1?/XFilm+sense&searchscope=1&SORT=D/XFilm+sense&searchscope=1&SORT=D&SUBKEY=Film+sense/1%2C139%2C139%2CE/marc&FF=XFilm+sense&searchscope=1&SORT=D&1%2C1%2C

philgooch commented 8 years ago

In the end it made more sense to transform the source data into the format required by TextParser. I'll close this issue. Thanks again!