ucsdlib / damsmanager

DAMS Manager
Other
3 stars 1 forks source link

Fixes #251 - Ignore control characters to avoid the com.hp.hpl.jena.shared.CannotEncodeCharacterException #252

Closed lsitu closed 6 years ago

lsitu commented 6 years ago

Fixes #251

Local Checklist

What does this PR do?

Ignore control characters during importing objects from Excel source to avoid the com.hp.hpl.jena.shared.CannotEncodeCharacterException while retrieving the object RDF that contains control characters.

Why are we doing this? Any context of related work?

References #250, #251

@ucsdlib/developers - please review

lsitu commented 6 years ago

Thanks @mcritchlow. @remerjohnson Does ignoring the control characters sound good to you? I think we may need to find a way to escape/replace them if you prefer to do the other way.

remerjohnson commented 6 years ago

@mcritchlow @lsitu Agreed, this was open-ended, because I wasn't sure what the system would allow. I am fine with the offending text being altered... is at all possible to have this action reported in the report somehow?

lsitu commented 6 years ago

@remerjohnson Do you think making it as part of the validation process would be better or not? I think we can reject any ingest requests if the source contains control characters until the issue is fixed.

remerjohnson commented 6 years ago

@lsitu That's a good question. If it's possible to:

For this specific issue/fix, I don't think the first part is a blocker or necessary for this work. It would just be nice to have in order to avoid the issue

lsitu commented 6 years ago

@mcritchlow / @remerjohnson I've added the commits to report the control characters as warning for both validation and ingest logging. The control characters will be replaced in the report as [Character Name] like [END OF TEXT]. For validation, the result message is summarized as Pre-processing succeeded with warning(s):

screen shot 2018-09-11 at 9 03 00 am

If deciding to ingest with control characters, those fields with control characters will be attached at the end of ingest damslog file as Warnings:

screen shot 2018-09-11 at 10 24 55 am

Does this sound good?

remerjohnson commented 6 years ago

@lsitu This looks great! That's what I wanted... for it to be reported but not trigger a failure necessarily. What do you think @mcritchlow ?

mcritchlow commented 6 years ago

Excellent work @lsitu :clap: