vatlab / sos

SoS workflow system for daily data analysis
http://vatlab.github.io/sos-docs
BSD 3-Clause "New" or "Revised" License
274 stars 45 forks source link

Value of shared='step_*' in the presence of output variables #1177

Open BoPeng opened 5 years ago

BoPeng commented 5 years ago

In #1033, we went a long way to share variables from substeps, and we ends up with something like

%run
[1: shared='step_seed']
input: for_each={'i': range(10)}
import random
seed = random.randint(0, 1000)

[2]
print(step_seed)

(documented at https://vatlab.github.io/sos-docs/doc/documentation/SoS_Step.html#Option_shared). I believe this should be reverted since attaching variables to _output is a much cleaner solution.

gaow commented 5 years ago

Okay -- without looking at the example in detail, is an alternative implementation straightforward? we are reducing the use of shared anyways. If it can be safely replaced with cleaner solution we should just go for teh better solution

BoPeng commented 5 years ago

First, if the seed do not need to be picked out,

[1]
input: for_each={'i': range(10)}
import random
seed = random.randint(0, 1000)
_output.set(seed=seed)

[2]
print(seed)

is the currently preferred method since it bundles the seed with the result.

Second, if there is a need to return the seed, we could shared the entire step_output or grab them from _output using something like

[10: shared={'seeds': 'step_output.get_all("seed")`}]

(get_all does not exist and we should perhaps introduce one) or

[10: shared={'output': 'step_output'}]
[20]
output.get_all('seeds')

The second approach is uglier than the step_seed stuff so I do not know if we need to introduce a convention step_{VARNAME} for a corner case...

BoPeng commented 5 years ago

In any case, we might need functions to reverse pared_with and group_with, something like

# [x.get('var') for x in step_input]
step_input.get_vars('var')

# [x.get('var') for x in step_input.groups]
step_input.get_group_vars('var')
gaow commented 5 years ago

The second approach is uglier than the stepseed stuff so I do not know if we need to introduce a convention step{VARNAME} for a corner case...

I'm not sure. I would avoid the step_{VARNAME} for now because this convention in itself is not very clean either.

I'd say we revert it for this ticket for now, and use separate tickets to discuss addition of get_* functions (possibly for a future release). For now it does not feel very pressing than some of the other recent tickets or improving documentation.

BoPeng commented 5 years ago

I can imagine the usefulness of step_RESULT for heavy computations that return RESULT from substeps. _output.set() plus shared={"seeds": step_output.get_all("seed")} is long but it gets the job done and more importantly, keep seed in signature. I would suggest that we keep the ticket open, mark the feature as experimental in documentation (and not advertise it in tutorials), and see if any convincing arguments will come up later.