What do we do with DTD part of every single article record? Are we going to store them in Mongo DB together with the raw article record?
alternative (my preferred solution) : we could separate it from in the NoSQL document from the article record using a key for the DTD
main part of the id used in the raw data store (MongoDB) is the system ID of the record in the source system (together with network related subparts) example: _id" : "(IDSBB)oai:aleph.unibas.ch:DSV01-000050103"
how can we do this with the articles? there is something like the journal id but I can't imagine this is the proper value
do we get deleted records?
can we recognize if an article is new, updated or deleted?
I guess the DTD shouldn't be sent to CBS?
what do we do with the records created by the XSLT process (and collected by contentCollector) once the aggregated file for CBS is done (write them into some kind of file archive - as we do it with other pipes?)
@liowalter