Closed lsitu closed 6 years ago
Thanks @mcritchlow. @remerjohnson Does ignoring the control characters sound good to you? I think we may need to find a way to escape/replace them if you prefer to do the other way.
@mcritchlow @lsitu Agreed, this was open-ended, because I wasn't sure what the system would allow. I am fine with the offending text being altered... is at all possible to have this action reported in the report somehow?
@remerjohnson Do you think making it as part of the validation process would be better or not? I think we can reject any ingest requests if the source contains control characters until the issue is fixed.
@lsitu That's a good question. If it's possible to:
For this specific issue/fix, I don't think the first part is a blocker or necessary for this work. It would just be nice to have in order to avoid the issue
@mcritchlow / @remerjohnson I've added the commits to report the control characters as warning for both validation and ingest logging. The control characters will be replaced in the report as [Character Name]
like [END OF TEXT]
.
For validation, the result message is summarized as Pre-processing succeeded with warning(s)
:
If deciding to ingest with control characters, those fields with control characters will be attached at the end of ingest damslog file as Warnings
:
Does this sound good?
@lsitu This looks great! That's what I wanted... for it to be reported but not trigger a failure necessarily. What do you think @mcritchlow ?
Excellent work @lsitu :clap:
Fixes #251
Local Checklist
master
branch?What does this PR do?
Ignore control characters during importing objects from Excel source to avoid the com.hp.hpl.jena.shared.CannotEncodeCharacterException while retrieving the object RDF that contains control characters.
Why are we doing this? Any context of related work?
References #250, #251
@ucsdlib/developers - please review