rufuspollock-okfn / bibserver

BibServer is open-source software what makes it easy to publish, manage and find bibliographies. BibServer is RESTful and web-friendly.
MIT License
126 stars 34 forks source link

MARC parser #146

Open markmacgillivray opened 12 years ago

markmacgillivray commented 12 years ago

MARC parsing will give access to large amounts of library data

markmacgillivray commented 12 years ago

To be done either as a parser for inclusion in the repo, or as a an external parser that runs remotely and sends an import to bibserver - either way will be a good example of particular functionality

edchamberlain commented 12 years ago

I've a barebones Perl based parser up as a gist:

https://gist.github.com/1836836

Should accept stdin. JSON seems valid but does not upload to bibsoup. Getting a 'unicode' object has no attribute 'get'. I'm not familar with the JSON module, but am wondering if I need to be more explicit about headers...

epoz commented 12 years ago

Ed, the first record in your JSON output is not a dictionary, but a string.

The BibServer importer was failing here: https://github.com/okfn/bibserver/blob/ecc08d230027a0a3fc2c788f9730bcf9825b92b5/bibserver/importer.py#L163 Trying to assign stuff to a unicode string.

We are improving the parser/importer to give better feedback on these kinds of errors. It should have ideally just failed on that record given feedback and continued. Looking into how to do this in a structured manner.

edchamberlain commented 12 years ago

Thanks. I'll take a look at the blank first line.

edchamberlain commented 12 years ago

Caused by a bad decleration, now fixed.

edchamberlain commented 12 years ago

Fe more tweaks, manual upload of output seems fine, all 953 records imported

http://bibsoup.net/edchamberlain/marc21_sample

epoz commented 12 years ago

Can we add a -bibserver command line switch that outputs: {"display_name": "MARC", "format": "marc", "contact": "Edmund Chamberlain emc59@cam.ac.uk", "bibserver_plugin": true}

The latest version can be found at: https://github.com/okfn/bibserver/blob/master/parserscrapers_plugins/marc2BibJson.pl

edchamberlain commented 12 years ago

This is done, along with a few other tweaks.

markmacgillivray commented 12 years ago

What is left to be done to get MARC parser working? @epoz can you let @edchamberlain know what is required? Then we can get the MARC parser available too.

epoz commented 12 years ago

We need to install the Perl MARC modules on the bibsoup server. I mailed Nils about that asking permission, but need to ping him again as I did not receive a reply. On my local machine the MARC parser works.

markmacgillivray commented 12 years ago

I added the perl requirement to the ticket re. moving to different server and got no complaints, so we can install on there. The new server by the way is s063. Let me know if you cant login to it

edchamberlain commented 12 years ago

Additional tweaks made to parser code. Should be fairly complete. Currently testing on Harvard data.