jiaola / marc4js

A Node.js API for handling MARC
Apache License 2.0
39 stars 7 forks source link
javascript marc marc-records marcxml parse streaming-api

Build Status

A Node.js module for handling MARC records

Installation

npm install marc4js

Features

marc4js provides the following features

Examples

Examples can be found in the the marc4js_examples. You can also find examples in the test directory.

Usage

var marc4js = require('marc4js');

Parsers

Parsers take various MARC formats and convert them to marc4js.marc.Record objects. Marc4js supports ISO2709, text (MarcEdit .mrc file) and MARCXML formats.

There are three ways to use a parser.

Callback API

marc4js.parse(data, options, function(err, records) {
});

Stream API

var parser = marc4js.parse(options);
parser.on('data', function(record) {
});
parser.on('end', function() {
});
parser.on('error', function(err) {
});
parser.write(data);
parser.end();

All events are based on the Node.js stream API.

Note that the parsers always work in the paused (aka non-flowing) streaming mode - therefore the objectMode option of the stream api is disabled, and is always set to true. Listening to the readable event will throw an erorr.

Pipe function

var parser = marc4js.parse(options);
fs.createReadStream('/path/to/your/file').pipe(parser).pipe(transformer).pipe(process.stdout);

options

format: default iso2709, possible values iso2709, marc, text, mrk, marcxml, xml

Different types of parsers

Iso2709Parser

Parses ISO2709 format. Used by default or when format is iso2709 or marc

MrkParser

Parses MarcEdit text format (.mrk files). Used when format is mrk

Other options:

TextParser

Parses a text format that is slightly different from mrk format. Used when format is text.

MarcxmlParser

Parses MarcEdit text format (.mrk files). Used when format is marcxml or xml

The stream and pipe API is SAX based so it doesn't require in-memory storage of the records. This is suitable for processing large MARCXML file. The callback API will read all records in memory and return it in the callback function and is not advised to process large MARCXML file.

Other options:

MijParser

Parses MARC-in-JSON format. Used when format is json or mij.

The stream and pipe API uses a sax-like JSON stream parser so it doesn't require in-memory storage of the records. Thus it can process large number of MARC-in-JSON records.

Transformers

Transformers transform the marc4js.marc.Record objects into various MARC formats. Marc4js supports ISO2709, text (MarcEdit .mrc file) and MARCXML formats.

Like parsers, transformers can also be used in three different ways.

Callback API

marc4js.transform(records, options, function(err, output) {
});

Stream API

var transformer = marc4js.transform(options);
transformer.on('readable', function(output) {
});
transformer.on('end', function() {
});
transformer.on('error', function(err) {
});
transformer.write(record); // one record
// or to write an array of records
// records.forEach(function(record) {
//     transformer.write(record);
// });
transformer.end();

Note that even though parsers can be only in the flowing mode, the transformers can use either flowing or paused (aka non-flowing) mode in the stream API. In the above example it's using the paused mode, but it can also use the data event handler if flowing mode is used.

Pipe function

var transformer = marc4js.transform(options);
fs.createReadStream('/path/to/your/file').pipe(parser).pipe(transformer).pipe(process.stdout);

options

format: default iso2709, possible values iso2709, marc, text, mrk, marcxml, xml objectMode: default false. Used to switch between the flowing and paused (aka non-flowing) mode in the stream API.

Different types of Transformers

Iso2709Transformer

Outputs ISO2709 format. Used by default or when format is iso2709 or marc

MrkTransformer

Outputs MarcEdit text format (.mrk files). Used when format is mrk

Other options:

TextTransformer

Outputs text format, which is slightly different from mrk format. Used when format is text.

MarcxmlTransformer

Outputs MarcEdit text format (.mrk files). Used when format is marcxml or xml

Other options:

MijTransformer

Outputs MARC-in-JSON string. Used when format is json or mij.

Other options: