Closed yusefnapora closed 8 years ago
Yeah, the case that I found really interesting is reprocessing an existing metadata source w/new translator -- there's probably a neat way of doing addressing for that w/IPLD
I think we want to have some sorts of validation here, perhaps with a user data schema?
@yusefnapora I'll update docs for this
@vyzo right now the validation happens in the translator and (given the generally permissive nature of python) I think that's an OK place for it -- it will run if you do an update based on an external source, and you could have always bypassed it easily anyway.
fair enough.
Noting down new CLI commands (to come) from slack w/yusef:
update Qm00.. translatorname@Qm11... single_file.someformat
: translate metadata from single_file.someformat
and apply it to the chain of Qm00...
canonical, possibly creating "satellite" canonicals (basically like ingest but with single file input and no "central" canonical creation)update --direct Qm00.. translatorname@Qm11... < something.json
: update Qm00...
chain with contents of something.json
directly (json needs to conform to our metaschema)
+reprocess Qm00... translatorname@Qm22...
: reprocess metadata in raw_ref
using a different translator and update the chain with the new translation -- need to figure out how to specify which chain cell to reprocess, and potentially add an "unfolded" results view in the client so you can find the right refProbably going to punt on the last one in this release, since it does involve some complexity
lgtm, good to merge?
Yeah, I think we can merge it. the update-with-translator step is kind of inefficient, since it will re-add the artefactCreatedBy
cell (and any other cells the translator spits out). That ends up being essentially a no-op when you fold over them if they've got the same content, but it adds a bit of overhead since you have to fetch the extra cell, and we have to store it, etc.
But avoiding that would be quite annoying; you'd have to first fetch the existing chain and compare the new cells to the existing ones to weed out duplicates. I'm fine with punting on that for the moment.
actually, now that I think of it, in real-world usage, you'd probably never have 100% identical cells, because if you're retranslating (or using a completely different translator), the translator id would be different, even if they end up producing the same metadata. So I'm fine with it after all 😄
Yeah, I can live with that for now
oh one more thing before I merge; I need to add the --skip-image-downloads
flag to the update command parsers.
oh, hmm. actually it doesn't make sense for update-direct; that one doesn't try to download images at all... still experiencing some coffee-brain latency this morning
78
So far just adds a very simple
Writer.update_artefact
method that accepts a canonical ref and ameta
dictionary, and submits anartefactUpdate
cell. It doesn't try to do any translation or schema validation, etc.I fixed up a couple things in the
reader.api
- we were applying the chain updates in reverse order, and we weren't merging dictionaries, so e.g. updatingmeta.data
would overwrite any existing value there.I'm not yet sure what the best way to expose the update command is... having this freeform method is fine (once we run it through the schema validation), but we should also support updating an existing record with new translations of external data, which is a bit trickier.
Open to suggestions on that :)