Open damian0604 opened 6 years ago
I just see that of course we HAVE already fixed this once, namely in the LexisNexis one. There we do it as follows:
def run(self, path, *args, **kwargs):
"""uses the documents from the load method in batches """
# this method is overwritten because in contrast to
# other importers, we do not have a single doctype.
# Each document can have a different one.
for doc in self.load(path, *args,**kwargs):
self._ingest(iterable=doc, doctype=doc['doctype'])
self.processed += 1
Anyhow, we need to make sure it works for JSON (and in principle also CSV) as well.
It seems that the
class Importer(BaseImportExport)
as defined incore/import_export_classes
assumes that the batch to be imported is of a single doctype (doctype is a mandatory argument of the .run() method.This makes the importer incompatible with the exporter: If I use the exporter to export a bunch of JSON documents which happen to have multiple doctypes, I cannot import them back using the importer.
It would be nice if this could be fixed, so that the json-importers/exporters in the
importers_exporters/
folder can be used to transfer documents between ES instances and for backup purposes.This needs to be resolved to solve https://github.com/uvacw/inca/issues/291