whole-tale / wt-prov-model

Experiments, design documents, and prototypes supporting a provenance model for Tales and runs.
MIT License
0 stars 1 forks source link

Distinguish files specific to runs #9

Open tmcphillips opened 4 years ago

tmcphillips commented 4 years ago

Among the files accessed by a run traced by ReproZip are those provided by the base operating system, software packages installed system-wide, and software installed for a particular user, as well as files specific to the run (e.g. data files explicitly input and output by the processes during the run).

It will be useful to distinguish between these different kinds of files to support queries and visualizations that answer specific questions about a run.

The config.yml file component of the ReproZip trace lists files used during a run and provided by particular installed software packages. We can harvest these and represent them as Prolog facts for easy querying.

Other OS-provided files are not associated with particular packages, but can be identified by the directories in which they are stored (e.g. /etc, /lib64). We can provide configuration capabilities for specifying such directories to enable these files to be identified.

Conversely, we can provide capabilities for explicitly identifying directories containing files of special significance to the research associated with a run.