uws-eresearch / ADELTA

0 stars 0 forks source link

Batch ingestor failing due to 'ascii' codec can't decode some characters #25

Closed ifeanyeg closed 10 years ago

ifeanyeg commented 10 years ago

Hi Lloyd,

There are some characters in some of the records that makes the batch ingester to fail when ingesting records ( e.g. —). The error message is show below: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 17: ordinal not in range(128)

Is there a way to skip such records or make the batch ingester to decode such character?

Regards, Ifeanyi

lloyd-h commented 10 years ago

This issue occurred only in the server. To solve this issue a decision has been taken to divide the original batch ingesting process into 2 parts.

  1. Generate MODS xml files using csv_to_xml_converter.py
  2. Ingest generated MODS xml files along with the images into the repository using batch-ingester.php

First part will be executed on an environment which doesn't have the above mentioned problem. Second part will be executed on the server.