Provenance parsing Initial sketch

qiime2 / provenance-lib

QIIME 2 Provenance Replay Tools

BSD 3-Clause "New" or "Revised" License

3 stars 4 forks source link

Provenance parsing Initial sketch #1

Closed ChrisKeefe closed 3 years ago

ChrisKeefe commented 3 years ago

Changes in https://github.com/ChrisKeefe/provenance_py/pull/1/commits/49a2d53132a57f9e6820ae1a1d73750afcf2e95f reduce coupling between ProvNode and ProvTree (which was previously responsible for assigning parentage relationships to all ProvNodes). In exchange, ProvNode must be associated with (and store a reference to) one or more Archives.

For the prior approach, see https://github.com/ChrisKeefe/provenance_py/pull/1/commits/1281878510acdc42cb5ba3ee40c9ad8b62dacf0e

ChrisKeefe commented 3 years ago

Traversal logic is currently weak, producing a depth-first traversal of UUIDs, with duplication, rather than a DAG of unique nodes. Question: what kind of data does this thing need to ship to q2view in order for it to do its lovely tree-building? Is YAML a reasonable vehicle? If so, do we need !ref tags of some kind to point q2view at duplicate nodes?

ChrisKeefe commented 3 years ago

Here's the PR in question, @gregcaporaso. I've invited you as a collaborator here for now. Should we move this repo into @caporaso-lab or @qiime2? Alternately, I can just keep hacking away here for now, and we can move it later.

ChrisKeefe commented 3 years ago

FYI: the "runner.py" script has been superseded by proper unit tests. I'm going to keep it around for a little bit, just in case it's useful for diagnostic purposes, but it will dropped eventually and need not be reviewed.

ChrisKeefe commented 3 years ago

~Going forward, let's replace filepath filtering for root metadata by capturing a root uuid and then telling zipfile to open the file directly (we know the filename).~

Addressed in 12a2cdd6df382b2fc43b8b85b263d755efd6f155

ChrisKeefe commented 3 years ago

Merging this now for readability reasons. The remaining comments will be addressed in a future PR.