solrmarc / solrmarc

SolrMarc is a utility that reads in MARC records, extracts information from various fields as specified in an indexing specification, and sends that information to a specified Apache Solr index.
Apache License 2.0
39 stars 22 forks source link

Error: Illegal character <Z> found in scanner state startspec #82

Open Witiko opened 5 years ago

Witiko commented 5 years ago

Ex Libris Aleph produces dumps that contain non-standard MARC fields such as Z30, CAT, and AVA:

AVA    [a]: MED50
       [b]: KUK
       [e]: available
       [f]: 1
       [g]: 0
       [h]: N
       [i]: 0
       [j]: VV

To parse these non-standard fields, I added rules into the index specification file such as:

barcode =  Z305, first

However, SolrMarc produces the following error:

barcode : Error: Illegal character <Z>  found in scanner state startspec

Would you be in favor of accepting a PR that adds configurable support for arbitrary alphabetical characters in MARC fields?

haschart commented 5 years ago

Vit,

I would be amenable to a Pull Request that adds support for such fields, but I'm also in the process of working on some new functionality that may make it not necessary.

In addition to the current method of specifying which fields to use, I have been working on being able to specify a regular expression that matches the fields you want.

it would work like this:

barcode = fields(Z30)

Testing the not-yet-released code on a record I have with non-numeric field tags, does in fact work.

The initial idea for the regex method of field specification was to better support discovery questions about the records themselves, like "How many records have a 336 field?" But after some refinement it seems it will suit your needs as well.

-Bob


From: Vít Novotný notifications@github.com Sent: Wednesday, April 24, 2019 9:27:46 PM To: solrmarc/solrmarc Cc: Subscribed Subject: [solrmarc/solrmarc] Error: Illegal character found in scanner state startspec (#82)

Ex Libris Aleph produces dumps that contain non-standard MARC fields such as Z30, CAT, and AVA:

AVA [a]: MED50

   [e]: available
   [f]: 1
   [g]: 0
   [h]: N
   [i]: 0
   [j]: VV

To parse these non-standard fields, I added rules into the index specification file such as:

barcode = Z305, first

However, SolrMarc produces the following error:

barcode : Error: Illegal character found in scanner state startspec

Would you be in favor of accepting a PR that adds configurable support for arbitrary alphabetical characters in MARC fields?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/solrmarc/solrmarc/issues/82, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAIIC4JDY2Y5LQNDRWWNERDPSECJFANCNFSM4HIJBC4A.