Open alsalatkajr opened 1 week ago
Update: Forgot to mention, the below is an XML normalization rule. I'm not entirely sure if this is possible using DC, but you could try without the xpath conditions when setting the values into TEMP. -------------------
A while back I created an over convoluted rule to do this - I've not used it in any of my imports yet and it's also been a while since I looked at it again, but I think it should do the trick.
It requires multiple TEMP fields and regex to 'extract' each value. The number of TEMP fields is dependant on how many values you want from it.
rule "Separating Values"
when
# Run this rule only when semi colon character exists in subject field
exist "//*[local-name()='subject'][1][contains(., ';')]"
then
# The following uses some wild and overcomplicated xpath conditions to only run when there are x amount (or more) semicolons.
# Without these conditions, the rules to remove the substrings would just run regardless and could pick out the wrong values
set TEMP"1" to xpath "//*[local-name()='subject'][1][contains(., ';')]"
set TEMP"2" to xpath "//*[local-name()='subject'][1][contains(., ';')]"
set TEMP"3" to xpath "//*[local-name()='subject'][1][contains(., ';') and contains(substring-after(., ';'), ';')]"
set TEMP"4" to xpath "//*[local-name()='subject'][1][contains(., ';') and contains(substring-after(., ';'), ';') and contains(substring-after(substring-after(., ';'), ';'), ';')]"
set TEMP"5" to xpath "//*[local-name()='subject'][1][contains(., ';') and contains(substring-after(., ';'), ';') and contains(substring-after(substring-after(., ';'), ';'), ';') and contains(substring-after(substring-after(substring-after(., ';'), ';'), ';'), ';')]"
set TEMP"6" to xpath "//*[local-name()='subject'][1][contains(., ';') and contains(substring-after(., ';'), ';') and contains(substring-after(substring-after(., ';'), ';'), ';') and contains(substring-after(substring-after(substring-after(., ';'), ';'), ';'), ';') and contains(substring-after(substring-after(substring-after(substring-after(., ';'), ';'), ';'), ';'), ';')]"
set TEMP"7" to xpath "//*[local-name()='subject'][1][contains(., ';') and contains(substring-after(., ';'), ';') and contains(substring-after(substring-after(., ';'), ';'), ';') and contains(substring-after(substring-after(substring-after(., ';'), ';'), ';'), ';') and contains(substring-after(substring-after(substring-after(substring-after(., ';'), ';'), ';'), ';'), ';') and contains(substring-after(substring-after(substring-after(substring-after(substring-after(., ';'), ';'), ';'), ';'), ';'), ';')]"
# Value before the first semicolon: [1];2;3;4;5;6;7
remove substring using regex (TEMP"1","(;.*)")
# Value between the first and second semicolons: 1;[2];3;4;5;6;7
remove substring using regex (TEMP"2","^.*?;")
remove substring using regex (TEMP"2","(;.*)")
# Value between the second and third semicolons: 1;2;[3];4;5;6;7
remove substring using regex (TEMP"3","^([^;]+;[^;]+);")
remove substring using regex (TEMP"3","(;.*)")
# Value between the third and fourth semicolons: 1;2;3;[4];5;6;7
remove substring using regex (TEMP"4","^([^;]+;[^;]+;[^;]+);")
remove substring using regex (TEMP"4","(;.*)")
# Value between the fourth and fifth semicolons: 1;2;3;4;[5];6;7
remove substring using regex (TEMP"5","^([^;]+;[^;]+;[^;]+;[^;]+);")
remove substring using regex (TEMP"5","(;.*)")
# Value between the fifth and sixth semicolons: 1;2;3;4;5;[6];7
remove substring using regex (TEMP"6","^([^;]+;[^;]+;[^;]+;[^;]+;[^;]+);")
remove substring using regex (TEMP"6","(;.*)")
# Value between the sixth and seventh semicolons: 1;2;3;4;5;6;[7]
remove substring using regex (TEMP"7","^([^;]+;[^;]+;[^;]+;[^;]+;[^;]+;[^;]+);")
remove substring using regex (TEMP"7","(;.*)")
set TEMP"1" in "dc"."subject"
set TEMP"2" in "dc"."subject"
set TEMP"3" in "dc"."subject"
set TEMP"4" in "dc"."subject"
set TEMP"5" in "dc"."subject"
set TEMP"6" in "dc"."subject"
set TEMP"7" in "dc"."subject"
end
So the above rule would change this
<dc:subject>City planning; Community development; Nonprofit organizations; Urban renewal</dc:subject>
Into this:
There are some 'blank' output because the rule looks for 7 values, but if i remember correctly, the 'blanks' won't display on Primo.
Feel free to amend the rule to your own requirements.
I need help determining whether or not I can use normalization rules to separate values delimited with a ; to display individually in PrimoVE. These records are being imported from CONTENTdm. Public display: https://utc.primo.exlibrisgroup.com/permalink/01UTC_INST/b3svvu/alma991004703897103991 See below:
Example record:
Instead of City planning; Community development; Nonprofit organizations; Urban renewal displaying in one string, can a normalization rule allow for this string to display as follows: City planning Community development Nonprofit organizations Urban renewal
I started with the following:
rule "separate subjects by semicolon" when exist "dc"."subject" then set TEMP"1" to dc value "dc"."subject" remove substring using regex (TEMP"1",";.*") set "dc"."subject" to TEMP"1" end
All this does is remove everything after the first semicolon. Is it possible to move content to another TEMP after it's been removed using the "remove substring using regex" command? Addressing this in CONTENTdm is not an option, as per our Special Collections team (there are downstream impacts for other systems).
Thanks for your help!