primousers / primo-ve-norm

A repository to collect and track community Primo VE Normalization Rules.
MIT License
24 stars 2 forks source link

Separate values delimited by ; to display individually #31

Open alsalatkajr opened 1 week ago

alsalatkajr commented 1 week ago

I need help determining whether or not I can use normalization rules to separate values delimited with a ; to display individually in PrimoVE. These records are being imported from CONTENTdm. Public display: https://utc.primo.exlibrisgroup.com/permalink/01UTC_INST/b3svvu/alma991004703897103991 See below:

Example record:

Chattanooga Venture News, 1986 January Chattanooga Venture News, vol. III, no. 1 Aiken, Ann; Alexander, Frances; Chattanooga Venture (Organization); Lublin, Joann S. Newsletter of Chattanooga Venture News published by Chattanooga Venture, an organization dedicated to revitalizing Chattanooga, Tennessee, that contains information about upcoming projects and continued work in the community. Wrapping - Up the $15.1 Million Package; Getting Ready for Homecoming ‘86; A Shelter for Battered Lives; Watson Designs Downtown; Memorial Auditorium Final Report; Learning from CHARLOTTE; Results of the Vision 2000 Community Issues Survey; Venture’s First Report Card; State Aquarium to Be “Museum of the Living Room”; Forget Big Convention Halls; Now Cities See Aquariums as Urban Renovation Tools; Saving the Earth; Signs, Signs, Everywhere Signs; Sharpening Chattanooga’s Competitive Edge; Backyard Wildlife Rally January 18; Healthcare Coalition Achieves “Wellness”; David Birch Focuses On Quality; Venture Launches Human Relations Task Force; Venture Seeks Violence Shelter; Arts Makes the Difference; Sign Control Task Report Chattanooga Venture (Organization) Chattanooga (Tenn.) 1986-01 **City planning; Community development; Nonprofit organizations; Urban renewal** Chattanooga Venture (Organization) Chattanooga (Tenn.) Text newsletters image/jp2; text/plain 10 leaves English eng CHC-2010-055-006 Chattanooga Venture News newsletters Chattanooga History Collections Chattanooga Public Library; University of Tennessee at Chattanooga Chattanooga Venture News newsletters http://rightsstatements.org/vocab/InC/1.0/ Chattanooga Venture (Organization) http://cdm16877.contentdm.oclc.org/cdm/ref/collection/p16877coll46/id/89 oai:cdm16877.contentdm.oclc.org:p16877coll46/89 Archival Material

Instead of City planning; Community development; Nonprofit organizations; Urban renewal displaying in one string, can a normalization rule allow for this string to display as follows: City planning Community development Nonprofit organizations Urban renewal

I started with the following:

rule "separate subjects by semicolon" when exist "dc"."subject" then set TEMP"1" to dc value "dc"."subject" remove substring using regex (TEMP"1",";.*") set "dc"."subject" to TEMP"1" end

All this does is remove everything after the first semicolon. Is it possible to move content to another TEMP after it's been removed using the "remove substring using regex" command? Addressing this in CONTENTdm is not an option, as per our Special Collections team (there are downstream impacts for other systems).

Thanks for your help!

mwan-work commented 4 days ago

Update: Forgot to mention, the below is an XML normalization rule. I'm not entirely sure if this is possible using DC, but you could try without the xpath conditions when setting the values into TEMP. -------------------

A while back I created an over convoluted rule to do this - I've not used it in any of my imports yet and it's also been a while since I looked at it again, but I think it should do the trick.

It requires multiple TEMP fields and regex to 'extract' each value. The number of TEMP fields is dependant on how many values you want from it.

rule "Separating Values"
when

# Run this rule only when semi colon character exists in subject field
exist "//*[local-name()='subject'][1][contains(., ';')]"

then

# The following uses some wild and overcomplicated xpath conditions to only run when there are x amount (or more) semicolons.
# Without these conditions, the rules to remove the substrings would just run regardless and could pick out the wrong values
set TEMP"1" to xpath "//*[local-name()='subject'][1][contains(., ';')]"
set TEMP"2" to xpath "//*[local-name()='subject'][1][contains(., ';')]"
set TEMP"3" to xpath "//*[local-name()='subject'][1][contains(., ';') and contains(substring-after(., ';'), ';')]"
set TEMP"4" to xpath "//*[local-name()='subject'][1][contains(., ';') and contains(substring-after(., ';'), ';') and contains(substring-after(substring-after(., ';'), ';'), ';')]"
set TEMP"5" to xpath "//*[local-name()='subject'][1][contains(., ';') and contains(substring-after(., ';'), ';') and contains(substring-after(substring-after(., ';'), ';'), ';') and contains(substring-after(substring-after(substring-after(., ';'), ';'), ';'), ';')]"
set TEMP"6" to xpath "//*[local-name()='subject'][1][contains(., ';') and contains(substring-after(., ';'), ';') and contains(substring-after(substring-after(., ';'), ';'), ';') and contains(substring-after(substring-after(substring-after(., ';'), ';'), ';'), ';') and contains(substring-after(substring-after(substring-after(substring-after(., ';'), ';'), ';'), ';'), ';')]"
set TEMP"7" to xpath "//*[local-name()='subject'][1][contains(., ';') and contains(substring-after(., ';'), ';') and contains(substring-after(substring-after(., ';'), ';'), ';') and contains(substring-after(substring-after(substring-after(., ';'), ';'), ';'), ';') and contains(substring-after(substring-after(substring-after(substring-after(., ';'), ';'), ';'), ';'), ';') and contains(substring-after(substring-after(substring-after(substring-after(substring-after(., ';'), ';'), ';'), ';'), ';'), ';')]"

# Value before the first semicolon: [1];2;3;4;5;6;7
remove substring using regex (TEMP"1","(;.*)")

# Value between the first and second semicolons: 1;[2];3;4;5;6;7
remove substring using regex (TEMP"2","^.*?;")
remove substring using regex (TEMP"2","(;.*)")

# Value between the second and third semicolons: 1;2;[3];4;5;6;7
remove substring using regex (TEMP"3","^([^;]+;[^;]+);")
remove substring using regex (TEMP"3","(;.*)")

# Value between the third and fourth semicolons: 1;2;3;[4];5;6;7
remove substring using regex (TEMP"4","^([^;]+;[^;]+;[^;]+);")
remove substring using regex (TEMP"4","(;.*)")

# Value between the fourth and fifth semicolons: 1;2;3;4;[5];6;7
remove substring using regex (TEMP"5","^([^;]+;[^;]+;[^;]+;[^;]+);")
remove substring using regex (TEMP"5","(;.*)")

# Value between the fifth and sixth semicolons: 1;2;3;4;5;[6];7
remove substring using regex (TEMP"6","^([^;]+;[^;]+;[^;]+;[^;]+;[^;]+);")
remove substring using regex (TEMP"6","(;.*)")

# Value between the sixth and seventh semicolons: 1;2;3;4;5;6;[7]
remove substring using regex (TEMP"7","^([^;]+;[^;]+;[^;]+;[^;]+;[^;]+;[^;]+);")
remove substring using regex (TEMP"7","(;.*)")

set TEMP"1" in "dc"."subject"
set TEMP"2" in "dc"."subject"
set TEMP"3" in "dc"."subject"
set TEMP"4" in "dc"."subject"
set TEMP"5" in "dc"."subject"
set TEMP"6" in "dc"."subject"
set TEMP"7" in "dc"."subject"
end

So the above rule would change this <dc:subject>City planning; Community development; Nonprofit organizations; Urban renewal</dc:subject>

Into this: image

There are some 'blank' output because the rule looks for 7 values, but if i remember correctly, the 'blanks' won't display on Primo.

Feel free to amend the rule to your own requirements.