Non-linear pipeline file naming convention issue

gaow commented 5 years ago

I want to document the problem here because I investigated it and possibly we can improve it down the road, a non-crucial enhancement in my view.

In this line of a DSC the execution sequence can be abstractly written as "A B B' * C" and is none linear because of B and B': B' does not depend on B but C depends on it. By our current convention the output file name to E is expected to look something like:

CABB'

but it actually generates

CAB'B

where the two branches, B and B', seems to switch place.

This can cause confusion when looking at the output file but it's not wrong. It's just an artifact due to the none-linear logic. To improve we basically have to rewrite a function that converts:

CAB'ABA --> CABB'

Instead of

CAB'ABA --> CAB'B

The line in the above link can reproduce the problem. The fix is simple in this overly simplified example but I need to think more carefully if I want to invent a new rule to deal with all possible branch name switch problems in such non-linear pipeline logic.

pcarbo commented 5 years ago

I agree it would be nice to fix, but is not critical.

gaow commented 4 years ago

To be fixed as we revamp data storage format.

stephenslab / dsc

Non-linear pipeline file naming convention issue #199