Closed remerjohnson closed 4 years ago
@remerjohnson I see the following title in record b10629746:
Current state and future perspectives of adjuvant and neoadjuvant targeted therapy in the management of clinically localized renal cell carcinoma : Renal cell carcinoma adjuvant and neoadjuvant \
And yes, the backslash at the end causes the invalid JSON syntax error during parsing the converted JSON data like {"Title":"Current state and future perspectives of adjuvant and neoadjuvant targeted therapy in the management of clinically localized renal cell carcinoma : Renal cell carcinoma adjuvant and neoadjuvant \"}
, which means escaping the double quote at the end. I think we can provide more information on the error instead of null
. Does it sounds good?
Also, would you like to check for control characters and throw an error for the occurrence of any control characters at this step?
@lsitu Ah, that makes sense that it is control characters.
Yes, having a report of records with control characters in them sounds good 👍
@remerjohnson Actually it's not a control character, just a backslash \
. The error is coming from our internal conversion json result, which is a special case for the \
at the end.
If we do control character error check at this step, you might need to fix it before you can export the Roger record. I think you'll have the control characters check in the later step when you ingest the record that won't interrupt you.
Would you still like to have the control characters check here?
@remerjohnson While dipping deeper into the error you got, I think the control characters won't be an issue here since the internal JSON source is converting from MARCXML. The problem is a special case in Roger records that appended the backslash \
at the end of the title field. I think we can handle it as a special case to remove it or escape it, and provide more information regarding the error if it occurs instead of the null
. What do you think?
@lsitu Okay, that sounds good. I'll leave it up to you whether removing it or escaping it is better, and yes to the better error reporting. Thanks
@remerjohnson I've added PR https://github.com/ucsdlib/damsmanager/pull/396 to report the conversion errors for each Roger record instead of null
so that you can recognize the metadata problem more easily. And the special backslash at the end is escaped to avoid interrupting the conversion process .
@mcritchlow I think PR https://github.com/ucsdlib/damsmanager/pull/396 is ready for review now. Thanks.
@mcritchlow @lsitu Okay great. Should I test this now or wait?
@remerjohnson We may need to make a new release for dmasmanager to deploy it to staging for you to test. So just wait and I'll let you know once it's ready for you to test. Thanks.
@lsitu @remerjohnson tested converting roger titles with the backslash, and one bib that always fails and gave the thumbs up on the improved character handling and reporting
@arwenhutt @remerjohnson Thanks. I think another improvement is that with batch of bib's, it'll to give the error report on record base (record by record) when conversion error occurred now.
@lsitu - @remerjohnson verified this on production ✔️
Descriptive summary
When using the MARC Import tool on Roger records with backslash (
\
) or other control characters in the title, the tool fails and only sends a report with the body consisting of:Rationale
When submitting a large number of bib records for DAMS ingest, a report that lacks which records failed makes it very difficult to isolate issues. Also, MARC conversion tends to be messy and include some strange characters so it is a common issue.
Expected behavior
The MARC/Roger Import tool should either:
Actual behavior
The MARC Import fails, and the report only contains the message "null"
Steps to reproduce the behavior
Related Work
251