ucsdlib / damsmanager

DAMS Manager
Other
3 stars 1 forks source link

Batch export bug - Data from Scripps Research Expeditions #356

Closed arwenhutt closed 5 years ago

arwenhutt commented 5 years ago

Descriptive summary

Batch export of excel or csv for the "Data from Scripps Research Expeditions" collection fails.

It looks like all items are processed, the progress bar reaches 100%, and stays there for several minutes. Then returns this message:

Execution failed: nullEXCEL metadata export failed (0 of 717 failed): - Total items found 717. - Number of items processed 717. The exported content is ready for download.

The link provided for download returns this message:

File /pub/data1/damsmanager/tmp/batchExport-1832756173.xlsx doesn't exist.

RDF export works. I haven't noticed this occurring with other projects.

lsitu commented 5 years ago

@arwenhutt I think I found the problem that causing the error for batch export. Several authority records are corrupted with element like instead of , which trigger the exception during XSL transform that is hard to detect. For example, mads:Language: http://lib-hydratail-prod.ucsd.edu:8080/dams/api/objects/bb78201908 mads:CorporateName: http://lib-hydratail-prod.ucsd.edu:8080/dams/api/objects/bb3346116s

I am not sure how these authority records got corrupted but it may happen when we ran into that corrupted records issue during Update DOI with full RDF/XML editing last year.

I am thinking about how to report this issue in the Excel export output that allows you to export the collection and fix it. How about using a header like LanguageCorrupted, CorproateNameCorrupted:creation, CorproateName:creationCorrupted etc. for corrupted authority records?

lsitu commented 5 years ago

@arwenhutt I've moved forward create PR https://github.com/ucsdlib/damsmanager/pull/357 that appends text "Corrupted" to the Excel header for authority records that don't have a class name but element rdf:Description. I think we may need to review those authority records and fix them.

@mcritchlow The PR https://github.com/ucsdlib/damsmanager/pull/357 is ready for review now. Thanks.

arwenhutt commented 5 years ago

Thanks @lsitu !

jessicahilt commented 5 years ago

@lsitu Where can @arwenhutt test this? On staging? Or is this waiting to be deployed?

lsitu commented 5 years ago

@jessicahilt / @arwenhutt Yes. It was deployed to staging already so please test it there. Thanks.

arwenhutt commented 5 years ago

@lsitu okay, once the sync is done I'll test it.

arwenhutt commented 5 years ago

@lsitu Thanks, I've verified that I am able to export that collection now.

I do agree with @mcritchlow 's comment in pr 356 about needing to look at the issue of corrupted/incomplete md more generally, but that would be something for the future!