Analysis/Session Provenance

cjsifuen commented 1 year ago

Enable users to capture with ease, fidelity, and accuracy the actions/analysis performed on a dataset or sets of datasets.

Information to capture

Files uploaded, data filtering steps and parameters used
Dataset/visualization subsetting and parameters
Selection of visualization types, positions, sizes, etc.

Potential implementations

Capture information in a file that can be used to "rerun" what was done
Capture and save as an "instance", in a more perpetual nature

Considerations

This might look different for datasets of different sizes
This might look different for a hosted vs local version

cjsifuen commented 1 year ago

I spoke with the imaging group about learning from napari. Their strategy is different in that they support a local instance only. They capture the commands run, but I'm not sure it's actually they type of provenance we're talking about.

ergonyc commented 1 year ago

From what I understand about SODAs current workflow this should be pretty straightforward. The "output" of SODA is either a downloaded dataset or a visualization. So I think there are just 3 states to log or capture:

data origin (file name / path / source?) + metadata
sample filter
feature filter
visualization
- vis type + parameters
- subset selection

I think a generic R logging module can do capture, so that log as metadata just needs to be added to metadata and saved along with the visualization / data.

cjsifuen commented 1 year ago

From what I understand about SODAs current workflow this should be pretty straightforward. The "output" of SODA is either a downloaded dataset or a visualization. So I think there are just 3 states to log or capture:

data origin (file name / path / source?) + metadata

sample filter

feature filter

visualization

vis type + parameters

subset selection

I think a generic R logging module can do capture, so that log as metadata just needs to be added to metadata and saved along with the visualization / data.

This would be a light way to implement the first option.

A few more things to flag if this approach was taken:

Could add in a "save" or. "log" button to actively log metadata, but would also want to log changes automatically.
Might want to ensure no additional filtering takes place in the UI
Should check the R logging captures interactive visualizations

A possible way to do more complex logging/debugging could be to use a shiny logger to capture events and interactions -- though perhaps this is unnecessary. Just wanted to add some options that I found here.

ndcn / soda-ndcn

Analysis/Session Provenance #18