Closed Ivoah closed 1 month ago
Thanks @Ivoah. This is a good start.
I've run some scenarios with it and there are some edge cases I'm working on resolving. Some examples:
[>]
and "Food" could have [<]
). That should be handled gracefully.No need to make any changes on your end, I think I've got it from here.
I expect to ship a revised version of this that handles all the edge cases I can think of in the next few weeks.
So, I kept finding interesting edge cases to deal with. Most are resolved now.
The ones that remain are too deep to deal with at this time (and do have practical workarounds), so I am close to merging this & then adding my followup changes.
Here are some samples of what it currently looks like in practice.
The simple case where only one flow is unknown:
Note - there will be a new 'Console' area underneath the diagram which will log each flow calculation. This will help in cases where you ended up with unexpected results.
Here's what it looks like for the simple case above:
Splitting a remainder into more than one calculated flow will divide it evenly. Here's a group of three distinct flows consuming the same remainder:
If you don't want the remainders to appear evenly divided, one approach is to actually add more unknowns and then group them as intended.
Here's a pair of remainder-flow groups divided 60/40 by using five flows (3 of which are flowing to 'Remainder A' and 2 to 'Remainder B'), plus their console output:
Notice that in cases where an amount is getting divided, there can be cases where we have to increase the precision displayed in order to convey the result accurately.
All of the amounts in these inputs had no decimal places, but adding more decimal places for these divided flows was better than rounding them.
(Fun edge cases which convinced me that the displayed precision had to be increased when dividing:
These calculations do work in the reverse direction - you can have a flow which is made the right size to provide any missing amount for a node:
It is possible to legitimately have flows of zero in some cases, because there isn't always a remainder to be consumed.
Here's the logical endpoint of that path: what if nothing in the whole graph has an actual defined value?
(Making 'nothing' still render in a balanced sort of way was one of the interesting diversions here.)
I still have a conundrum to resolve about the syntax itself; I will post a separate comment about that.
Here's a use case I expect to be common:
A budget with one main unknown value, possibly being split into sub-values in a later stage.
In this first image, the remainder for Savings to consume is 300.
In the second, the only difference in the inputs is that 'All other expenses' was decreased by 300; the colorful flows all automatically adjusted to consume the new remainder:
The question is: Which of these two syntaxes will be more intuitive to the casual user to indicate this relation?
Wages [1500] Budget
Budget [400] Housing
Budget [500] All other expenses
Budget [<] Savings
Savings [<] 401K
Savings [<] Bank Account
Savings [<] Mattress
Wages [1500] Budget
Budget [400] Housing
Budget [500] All other expenses
Budget [>] Savings
Savings [>] 401K
Savings [>] Bank Account
Savings [>] Mattress
The argument for <
: The target amount is drawn from the source.
The argument for >
: The source node is sending out its remainder to one or more targets.
I think it's also useful to consider the less-frequent, opposite case of calculated inflows too.
The source for this "Income to add" flow would look like one of these two options:
Current Income [12] Budget
Income to add [>] Budget
Budget [10] Goods
Budget [10] Services
Current Income [12] Budget
Income to add [<] Budget
Budget [10] Goods
Budget [10] Services
The argument for representing a left-calculation with >
:
"Income to add" is getting its value from "Budget". The arrow is pointing to the data source.
The argument for representing a left-calculation with <
:
"Budget"'s size is what determines the amount for "Income to add". In a sense it is providing its amount to the source node. The arrow is pointing in the direction of the data flow.
I've been round and round about this...
When I contemplate writing the documentation for this feature, that's what makes me lean toward swapping the two symbols from what was submitted in the pull request.
I believe it will be easier to explain (& easier for people to remember) if the convention is that the arrow represents the direction the data is flowing, namely:
>
means the source node is providing a value to the target.<
means the target node is providing a value to the source.I am not absolutely convinced this is right, but when working on some complicated sample diagrams I did find myself getting confused when the symbols were the other way around.
Feedback on this is welcome; my intuition may not match anyone else's on this.
Feedback on this is welcome
Well here goes: I agree with your intuition.
The entire diagram description is about the flow.
Budget [...] Goods
If I put 300
in there it is clear that 300
goes from Budget
to Goods
. Equally, if I read
Budget [→] Goods # yes, I know it's '>'
my immediate interpretation is that remainder
goes from Budget
to Goods
. In my LTR-mind this represents the simple case, reads natural, and feels intuitive and right. Say there was only one outgoing flow from Budget
. You wouldn't need to read the syntax documentation to understand what was going on.
Whereas, if I read
Budget [←] Goods # yes, I know it's '<'
I basically stumble over it when reading it. I know that something more complicated is happening. Something is going from Goods
to Budget
- in this case information - yet the diagram description tells me numbers are still going from Budget
to Goods
.
However, swapping the symbols doesn't help with that. In fact, for me, it makes both scenarios more difficult to reason.
Another alternative could be to use a different symbol for <
, but would that be any clearer? Eg.
Budget [*] Goods
Budget [~] Goods
Budget [:] Goods
Budget [=] Goods
The closest of those, I think, would be =
for equalize
, but I suspect one would like to read it as equal
, which isn't helpful.
I think the use of angle brackets only works if they aren't considered arrows. For me they indicate a branching flow. So conceptually they would "point" towards the source, or collective element. This seemed to work well with:
Wages [1500] Budget
Budget [400] Housing
Budget [500] All other expenses
Budget [<] Savings
Savings [<] 401K
Savings [<] Bank Account
Savings [<] Mattress
Budget to savings doesn't seem quite as natural as there's only one flow, but the children of savings make a lot of intuitive sense to me.
That said, angle brackets can read as arrows, so I agree with Anthchirp that this can feel "backwards" to the flow of the diagram and cause stumble.
Use of alternate symbols seems reasonable. Is a blank appropriate (Budget [] Goods
) or will that be read as an omission?
My first thought for an alternate was the tilde. It has the "feel" of flow and the "sense" of the unknown which "seems" appropriate. Reaching for intuition here.
Obviously a symbol is brief but would a word work instead?
Budget [calc] Goods
Budget [split] Goods
Budget [other] Goods
A quick thought on the remainder splitting.
I like the thinking above, especially the provision to allow for weighted calculated pots. However, fractional remainders and the increasing of precision felt uncomfortable to me. It seems a simple solution and one that is far from wrong but left me glitching on things that just don't split below 1 well (people, for example).
I don't have a better solution but my first thought was to let the precision stand and assign remainder based on order - leaving unequal sets.
Employees [1] Owners
Employees [8] Staff
Staff [<] Sales
Staff [<] Production
Staff [<] Admin
Resulting in
Like I say, it's not a better solution but maybe a toggleable option at some point so you can choose 'accuracy' vs 'display precision' primacy.
I'm thinking of changing tack since (as just demonstrated) people's intuitive interpretations about <
and >
can differ.
My current thought is ?
and !
... I'll walk through the thinking below.
Also, from another perspective, on software-based keyboards (like tablets and phones), <
and >
are sometimes two taps away from the default keyboard. It would be nice to pick more convenient characters to reach. (Unfortunately the brackets for [amounts]
are in the same situation as <>
, but that syntax is pretty established for now... I can at least try not to make things worse.)
In the old pull request https://github.com/nowthis/sankeymatic/pull/41, the symbol *
was proposed, and that one had appealed to me a lot as the one to use for 'consume any remainder that the source node has'.
So if *
= "Use Remainder", then what symbol would make sense as "Fill in Missing Input"?
I was thinking of ?
as the "Fill in Missing Input" operator for a while. (There's a vague parallel here where both *
and ?
have meanings/usage that relate to each in Regular Expressions, but those meanings don't really map to this usage well, so that's not particularly helpful.)
Then I took a step back and thought about it this way:
Imagine a person encountering ?
as an amount, say in a set of inputs like this:
Wages [1500] Budget
Budget [400] Housing
Budget [500] All other expenses
Budget [?] Savings
...I think that even if a person is brand new to these diagrams, they would fairly quickly be able to interpret that ?
as 'use anything left over from Budget in that flow'. And I think a *
would not be as obvious to interpret that way.
If ?
= "Use Remainder", I would want something in the same semantic neighborhood to represent its opposite, "Fill in Missing Inputs".
.
or ,
since those can be parts of actual numbers... also they're so small that it's hard to even parse visually which is which on a small display.??
. But I think that could introduce other confusion. (Which version is for which direction? And how do you remember?)*
might work...!
has the virtue of being the semantic opposite of ?
in everyday life. If ?
represents one case, I think that it's not going to be too hard to remember (after learning it in a hint) that !
represents the reverse case.Here's how that would look in the above 'Fill in Missing Inputs' example:
Current Income [12] Budget
Income to add [!] Budget
Budget [10] Goods
Budget [10] Services
Also, design-wise, I am taking care to define these symbols in only one place in the code, so that if someone does wish to use different symbols in their own fork, it will be trivial to change either or both of them.
Side note – keying off of @duffry's comment about how a blank flow like a [] b
might be interpreted – I'm trying out treating that kind of empty flow as a skippable line (basically like a comment) rather than objecting to it as a syntax error or showing it as a 0-size flow. (The new Console section will note that the line was skipped as an empty flow.)
The user experience I'm anticipating there is that one can then just cut/delete the amount from between the brackets to see the diagram without that flow, then Cmd-Z/Ctrl-Z to see it put back.
I think that'll be slightly more convenient than going to the start of the line and typing //
(though that will also still work).
On tweaking the splitting mechanism -
[?]
per source node (and therefore not require any splitting). If someone does use 3 [?]
s and they were expecting whole numbers, I think it may be a helpful feature to let them know it's not dividing evenly.
Employees [1] Owners
Employees [8] Staff
Staff [?.375] Sales Staff [?.375] Production Staff [?.25] Admin
...though I can recognize that approach might not be ideal because you could only decide those percentages when you **already** know the size of `Staff`...
I guess an alternative would be to add a toggle which basically requires that whole numbers in remainders would never be split.
In my theoretical example of a node of size 1 getting split into 3, that would mean that the first consumer of the remainder here (`c`) would get the 1 and the other two consumers (`d`,`e`) would both get 0:
a [1] b b [?] c b [?] d b [?] e
That seems like it might be practical. I would likely have it off by default though.
(P.S. If anyone is wondering why `%` has not come up as a syntax option this whole time, that's because I would also still like to implement the percentage feature mentioned in #32, which is different from everything discussed here. That issue is concerned with percentages of a node's entire total, not of an unused remainder.)
When actually merging this and marshaling the other 10+ commits that followed it (from 6282e9198711b8e78bc8ca3ff66072920ab3f592 to b1af9c2915ceff6202e8b433eff69382bf9cbf58), I came to a couple of revised conclusions in the cold light of day:
?
+ !
didn't feel right - !
is an imperative mark, which doesn't make sense when expressing an unknown.*
, and assigned ?
as the operator for "Fill in missing inputs". In practice that has felt just fine as a syntax.*.3
to express "use 30% of the remainder".
What that means for now is that the first unknown that can consume an amount consumes all of it, leaving none for others. So in the case of a node with size = 1 having three consumers: the first one has a value of 1 and the other 2 get 0.The changes are here on GitHub but are not yet promoted to sankeymatic.com. The site should get these changes applied in the next day or two.
The changes are here on GitHub but are not yet promoted to sankeymatic.com. The site should get these changes applied in the next day or two.
is this already live - I assume it would be available at https://www.sankeymatic.com/build-next/ ?
Thanks, Andre
@andre68723 I had found a couple more issues that I needed to resolve - some flows with colors were not included in calculations, and when a graph is reversed, the calculations were not flipped accordingly.
See the 4 newest commits from May 28 (84209af41881a1cadd05aa34c743c890b838f470) through June 2 (e81503a09138ec2783a82b347cabb5a6732cd07a) for those fixes.
All of these changes are live on sankeymatic.com/build/ as of right now.
There's more to do in terms of:
but yes, you can now use *
and ?
as Amount values in diagrams.
P.S. /build-next/ is inactive for the moment, there's no major layout-change-in-progress to preview at this time.
Flows with "<" as an amount will fill the remaining space from their source. Flows with ">" as an amount will be the sum of their targets.