ucsdlib / damsmanager

DAMS Manager
Other
3 stars 1 forks source link

Backslash characters in Roger records cause MARC Import tool to fail, and not report errors #390

Closed remerjohnson closed 4 years ago

remerjohnson commented 4 years ago

Descriptive summary

When using the MARC Import tool on Roger records with backslash (\) or other control characters in the title, the tool fails and only sends a report with the body consisting of:

null

Rationale

When submitting a large number of bib records for DAMS ingest, a report that lacks which records failed makes it very difficult to isolate issues. Also, MARC conversion tends to be messy and include some strange characters so it is a common issue.

Expected behavior

The MARC/Roger Import tool should either:

Actual behavior

The MARC Import fails, and the report only contains the message "null"

Steps to reproduce the behavior

  1. Go to the DAMS Manager "MARC/MODS Import" tool
  2. Select "Roger Record" in the pulldown for "Metadata Souce"
  3. Enter in a bib number where the title has a backslash or other control character in it (e.g. 'b10629746')
  4. Select values for the remaining required options
  5. Select 'Submit'
  6. Observe report sent to email

Related Work

251

lsitu commented 4 years ago

@remerjohnson I see the following title in record b10629746:

 Current state and future perspectives of adjuvant and neoadjuvant targeted therapy in the management of clinically localized renal cell carcinoma : Renal cell carcinoma adjuvant and neoadjuvant \

And yes, the backslash at the end causes the invalid JSON syntax error during parsing the converted JSON data like {"Title":"Current state and future perspectives of adjuvant and neoadjuvant targeted therapy in the management of clinically localized renal cell carcinoma : Renal cell carcinoma adjuvant and neoadjuvant \"}, which means escaping the double quote at the end. I think we can provide more information on the error instead of null. Does it sounds good? Also, would you like to check for control characters and throw an error for the occurrence of any control characters at this step?

remerjohnson commented 4 years ago

@lsitu Ah, that makes sense that it is control characters.

Yes, having a report of records with control characters in them sounds good 👍

lsitu commented 4 years ago

@remerjohnson Actually it's not a control character, just a backslash \. The error is coming from our internal conversion json result, which is a special case for the \ at the end. If we do control character error check at this step, you might need to fix it before you can export the Roger record. I think you'll have the control characters check in the later step when you ingest the record that won't interrupt you. Would you still like to have the control characters check here?

lsitu commented 4 years ago

@remerjohnson While dipping deeper into the error you got, I think the control characters won't be an issue here since the internal JSON source is converting from MARCXML. The problem is a special case in Roger records that appended the backslash \ at the end of the title field. I think we can handle it as a special case to remove it or escape it, and provide more information regarding the error if it occurs instead of the null. What do you think?

remerjohnson commented 4 years ago

@lsitu Okay, that sounds good. I'll leave it up to you whether removing it or escaping it is better, and yes to the better error reporting. Thanks

lsitu commented 4 years ago

@remerjohnson I've added PR https://github.com/ucsdlib/damsmanager/pull/396 to report the conversion errors for each Roger record instead of null so that you can recognize the metadata problem more easily. And the special backslash at the end is escaped to avoid interrupting the conversion process . @mcritchlow I think PR https://github.com/ucsdlib/damsmanager/pull/396 is ready for review now. Thanks.

remerjohnson commented 4 years ago

@mcritchlow @lsitu Okay great. Should I test this now or wait?

lsitu commented 4 years ago

@remerjohnson We may need to make a new release for dmasmanager to deploy it to staging for you to test. So just wait and I'll let you know once it's ready for you to test. Thanks.

arwenhutt commented 4 years ago

@lsitu @remerjohnson tested converting roger titles with the backslash, and one bib that always fails and gave the thumbs up on the improved character handling and reporting

lsitu commented 4 years ago

@arwenhutt @remerjohnson Thanks. I think another improvement is that with batch of bib's, it'll to give the error report on record base (record by record) when conversion error occurred now.

arwenhutt commented 4 years ago

@lsitu - @remerjohnson verified this on production ✔️