Punctuation in Instance "Index Title" in imported records

michelleif commented 1 year ago

Submitted By: hsammons@stanford.edu Description: When I import a record to FOLIO using single-record import, the punctuation preceding MARC field 245, subfield c is included in the Instance "Index Title" (usually a / or , ). But this doesn't seem to be the case for records we migrated. I don't know if this punctuation mark is supposed to be included in Index Title, I am just noting the inconsistency. Instance HRID: For an example of such an imported record, see HRID in00000000049 Patron barcode: `` FOLIO App: Inventory Urgency: It's not urgent. Browser: Firefox

shelleydoljack commented 1 year ago

My best guess is that our FOLIO migration tools are properly stripping the ending punctuation of 245 subfields b, n, or p and data import or single record import are not. The mapping rules only specify to trim the start of subfield a based on the value of the 2nd indicator ("remove prefix by indicator". This page explains a bit about the RuleProcessorApi. This section refers to normalization done to the data and links to a page describing the normalization functions but that returns a 404. So, is normalization no longer being done? The only code in mod-source-record-manager-server's section that NormalizationFunction was under is folio > services > mappers > processor > MappingParametersProvider, which doesn't contain any info about normalization functions.

Maybe this info is also outdated or it doesn't work anymore? "Processing rules on concatenated data":

By default rules run on the data in a single sub-field. In order to concatenate un-normalized data, and run the rules on the concatenated data add the following field: applyRulesOnConcatedData: true. This can be used when punctuation should only be removed from the end of a concatenated string.

The marc bib mapping rules on folio-test (and folio nolana demo site) have this for field 245:

"245":[
  {
    "target": "title",
    "subfield":["a", "n", "p", "b", "c", "f", "g", "h", "k",…],
    "description": "Resource Title",
    "applyRulesOnConcatenatedData": true
  },
  {
    "rules":[
      {"conditions":[{"type": "remove_prefix_by_indicator, capitalize" } ]}
    ],
    "target": "indexTitle",
    "subfield":["a", "n", "p", "b"],
    "description": "Index title",
    "applyRulesOnConcatenatedData": true
  }
]

The ending "/" from the subfields a, n, p, or b should be removed according to this rule. Is there a jira ticket for data import that this is not working? Is it working in the demo site?

shelleydoljack commented 1 year ago

Yes, it also does not strip the ending "/" on the demo site. https://folio-nolana.dev.folio.org/inventory/view/ff3a7960-293d-446f-bdb8-664d13c879f9?sort=title&xidtype=f26df83c-aa25-40b6-876e-96852c3d4fd4 Screen Shot 2023-03-06 at 3 45 44 PM

shelleydoljack commented 1 year ago

Oh wait, I see what's happening. The rule being applied on the concatenated data is the "remove_prefix_by_indicator, capitalize" rules. There is no "get rid of ending punctuation rule". Are the EBSCIO folio_migration_tools doing that for us? Should there be a "get rid of ending punctuation rule" we can add to our marc bib mapping?

jermnelson commented 1 year ago

Just added a ticket about this issue to the upstream repository. There is a regular expression substitution that is removing the trailing slash that isn't present in the data-import app. The indexTitle mapping for the 245 is only impacted map in our Nolana MARC bib mapping.

sul-dlss / FOLIO-Project-Stanford

Punctuation in Instance "Index Title" in imported records #356