Testing is a pain because the relationship between data sources, dataset ids and redbox ingest files is not well-thought-out.
The coataglue script can be made to re-harvest datasets which it's already seen, but the redbox XML files which it produces have the existing IDs, which means that they won't be ingested - once redbox has seen a filename and harvested it, it will not revisit it.
What the system needs to do is to always increment the id of a history file even if it has seen it before - so every redbox output file will always have a unique filename even if the source dataset was the same.
Testing is a pain because the relationship between data sources, dataset ids and redbox ingest files is not well-thought-out.
The coataglue script can be made to re-harvest datasets which it's already seen, but the redbox XML files which it produces have the existing IDs, which means that they won't be ingested - once redbox has seen a filename and harvested it, it will not revisit it.
What the system needs to do is to always increment the id of a history file even if it has seen it before - so every redbox output file will always have a unique filename even if the source dataset was the same.