samvera / hyrax

Hyrax is a Ruby on Rails Engine built by the Samvera community. Hyrax provides a foundation for creating many different digital repository applications.
http://hyrax.samvera.org/
Apache License 2.0
185 stars 124 forks source link

Encoding::UndefinedConversionError in CharacterizeJob #5671

Closed conorom closed 1 year ago

conorom commented 2 years ago

Descriptive summary

This relates to Hyrax version from 2.9.5 to 3.4.1 (and main branch). This is the bug that led to the discovery of #5670

I'm unsure it's worth merging a fix for this if that one gets fixed, but I have a PR ready for this niggle which I will be merging into heliotrope anyways, and it's nice to have the issue findable here if someone else runs into it before #5670 is fixed.

So. Files with non-ASCII characters in the name will break CharacterizeJob if the job tries to set the FileSet title to its original_file's original_name. This is the result:

/hydra-dev/heliotrope-staging/shared/vendor/bundle/ruby/2.7.0/gems/rdf-3.1.15/lib/rdf/model/literal.rb:169:in `encode'
/hydra-dev/heliotrope-staging/shared/vendor/bundle/ruby/2.7.0/gems/rdf-3.1.15/lib/rdf/model/literal.rb:169:in `initialize'
/hydra-dev/heliotrope-staging/shared/vendor/bundle/ruby/2.7.0/gems/rdf-3.1.15/lib/rdf/model/literal.rb:130:in `new'
/hydra-dev/heliotrope-staging/shared/vendor/bundle/ruby/2.7.0/gems/rdf-3.1.15/lib/rdf.rb:148:in `Literal'
/hydra-dev/heliotrope-staging/shared/vendor/bundle/ruby/2.7.0/gems/active-triples-1.1.1/lib/active_triples/relation.rb:530:in `value_to_node'
/hydra-dev/heliotrope-staging/shared/vendor/bundle/ruby/2.7.0/gems/active-triples-1.1.1/lib/active_triples/relation.rb:516:in `set_value'
/hydra-dev/heliotrope-staging/shared/vendor/bundle/ruby/2.7.0/gems/active-triples-1.1.1/lib/active_triples/relation.rb:399:in `block in set'
/hydra-dev/heliotrope-staging/shared/vendor/bundle/ruby/2.7.0/gems/active-triples-1.1.1/lib/active_triples/relation.rb:399:in `each'
/hydra-dev/heliotrope-staging/shared/vendor/bundle/ruby/2.7.0/gems/active-triples-1.1.1/lib/active_triples/relation.rb:399:in `set'
/hydra-dev/heliotrope-staging/shared/vendor/bundle/ruby/2.7.0/gems/active-triples-1.1.1/lib/active_triples/rdf_source.rb:479:in `set_value'
/hydra-dev/heliotrope-staging/shared/vendor/bundle/ruby/2.7.0/gems/active-fedora-12.1.1/lib/active_fedora/fedora_attributes.rb:33:in `set_value'
/hydra-dev/heliotrope-staging/shared/vendor/bundle/ruby/2.7.0/gems/active-fedora-12.1.1/lib/active_fedora/attribute_methods/dirty.rb:11:in `set_value'
/hydra-dev/heliotrope-staging/shared/vendor/bundle/ruby/2.7.0/gems/active-fedora-12.1.1/lib/active_fedora/attributes/property_builder.rb:53:in `label='
/hydra-dev/heliotrope-staging/releases/20220314184755/app/jobs/characterize_job.rb:45:in `perform'
/hydra-dev/heliotrope-staging/shared/vendor/bundle/ruby/2.7.0/gems/activejob-5.2.6.2/lib/active_job/execution.rb:39:in `block in perform_now'

aside: I assume label gets set correctly initially, i.e. in IngestJob, because it's pulled from the UploadedFile in the standard UI workflow.

Rationale

It's a new bug and should be fixed.

Expected behavior

You should be able to import a file with non-ASCII characters in the name (potentially assigning the original_name to a Fedora field).

Actual behavior

CharacterizeJob fails with such a file. Ingest does still succeed, however.

Steps to reproduce the behavior

Simply try to add this File to a Work/Monograph and see CharacterizeJob fail: ファイル.txt

Related work

5670

conorom commented 1 year ago

Reopening this. As discovered in heliotrope the error still occurs if the FileSet previously had a title distinct from its label (filename).

Don't know how I missed it TBH. First PR must've gotten a tad rushed.