Closed johnhuck closed 9 years ago
The foxml used for the migration has <dcterms:subject>Rites & ceremonies</dcterms:subject>
. But encoded ampersands in other fields are being imported correctly, e.g. dcterms:title in https://newport.library.ualberta.ca/files/gb19f580q#.Vg8Mpd9zjJ8 from <dcterms:title>“Nowhere to Turn, Nowhere to Go”: Library & Information Services for Sexual & Gender (LGBTQ) Minorities</dcterms:title>
. But here's one where a committee member listing had an ampersand that was not decoded: https://newport.library.ualberta.ca/files/4m90dv490#.Vg8ORN9zjJ8 . So perhaps it's multivalue fields?
Rails should be taking care of this for us:
irb(main):001:0> xml = "<a>foo&bar</a>"
=> "<a>foo&bar</a>"
irb(main):002:0> dom = Nokogiri.XML(xml)
=> #<Nokogiri::XML::Document:0x529b774 name="document" children=[#<Nokogiri::XML::Element:0x529b3f0 name="a" children=[#<Nokogiri::XML::Text:0x529b1d4 "foo&bar">]>]>
irb(main):026:0> t = dom.xpath("a/text()",NS)
=> [#<Nokogiri::XML::Text:0x529b1d4 "foo&bar">]
but
irb(main):027:0> t = dom.xpath("a/text()",NS).map(&:to_s)
=> ["foo&bar"]
Which explains why single-value fields are unaffected. We need to use Nokogiri's text method:
irb(main):028:0> t = dom.xpath("a/text()",NS).map(&:text)
=> ["foo&bar"]
Testing that now.
https://hydranorth.library.ualberta.ca/catalog?f%5Bsubject_sim%5D%5B%5D=Rites+%26amp%3B+ceremonies