Extrude using multiple variables - aka stacked plots

ARSanderson commented 4 years ago

A while ago had a need to extrude by a single variable and added that functionality to the extrude operator. More recently I wanted to extrude using multiple variables and create a stacked plot. I made the extension and used it for paper for SC. Such a plot is a bit out of the normal paradigm for VisIt. As such, I want to open a discussion rather push the code to the repo immediately. The code is on this branch:

https://github.com/visit-dav/visit/tree/feature/allen/ExtrudeWithMultipleVars

GUI (new and current)

GUI with values:

Resulting image with each color a variable:

For the above one really wants a label for each variable rather than a index value in the legend. The other is that the stacking is part of the extrude operator which is not obvious. It should be an operator to itself that uses the extrude under the hood.

Thoughts?

markcmiller86 commented 4 years ago

@biagas and @markcmiller86 will have a telecon with Allen to get more understanding of how to fit this in.

biagas commented 4 years ago

Thoughts from telecon:

the multi-variable extrude should be its own operator, 'Stacked Plot' is the currently proposed name. -- should an operator name have the word 'plot' in it?
Stuffing the results into a Subset plot would get the discrete legend and labelling desired when the 'index' option is chosen -- Would possibly requires some mods to the Subset plot, and changes to the new operator to populate and pass on the correct avtSubsets array. (Whatever happened to the notion of 'generalized' subsets in VisIt, can they create subset plots?)
Pseudocolor plot would be more appropriate for the 'value' option, as a continuous color table would be needed.
There is also the possibility of creating stacked curves (instead of boxes as in the example above) -- Would either the Pseudocolor or Subset plot be sufficient for these?
If both Subset plot and Pseudocolor plot are supported, the operator would need some way of knowing which plot is being used, to know what information to pass along.

Perhaps it makes more sense to create a new Stacked Plot (instead of an operator)

Direct control of color table, legend, and labels
Can set up hooks to change axes labels accordingly.
Could handle curves or 2d/3d geometry
It might, at the outset, require code duplication in regards to avtMapper functionality. I'm working on ideas for reducing code duplication but they aren't fully fleshed out yet.

markcmiller86 commented 4 years ago

We had a chance to discuss more with @ARSanderson.

There are potentially multiple pieces here...

Enhancement to extrude operator to support extrusion by a mesh variable instead of a constant
- Seems like a great enhancement by itself apart from anything else here.
The idea of a stacked plot
- Which seems similar in spirit to a generalization of Multi-Curve
Currently, @ARSanderson has implemented all this functionality essentially into a single enhancement to the extrude operator.
- It performs a variable extrude for a set of scalar mesh variables defined over a point mesh and the essentially geometrically catenates (e.g. stacks) their results together.
- I think we should consider generalizing some of these notions into a new operator (or plot)
- Another similar notion is that of a Multi-Plot operator which basically loops over downstream plot (or operator) attributes generating a plot output for each iteration all of which are somehow combined into a final single plot object.

markcmiller86 commented 4 years ago

Hey @ARSanderson...

We met early today to discuss.

We wound up having some questions to ask to help better understand how/where this fits into VisIt?

At what scale do you think the techniques you are currently trying to visualize performance data, communication maps begin to break down? 1000 tasks, 10,000 tasks, more?
- Reason for asking is that we worry the basic bar-stacks start to become hard to interpret as scale increases
- As an example, a subset plot in VisIt becomes fairly useless in terms of understanding which pieces are which at probably around 10² to 10³ subsets. If you are thinking just in terms of distinguishing pieces based upon color, its probably more like 32-64. And, of course the labels for the color legend eventually either get too small or fall off the bottom of the screen.
Do you have ability to adjust for how you write the performance data from the data producer?
- We were wondering if you wrote a UCD mesh of line segments with variables defined those line segments, you could produce largely the same plot of "toothpicks" you showed Kathleen and Mark yesterday without having to use any special enhancements to VisIt.
- Alternatively, we were thinking that it might be possible to produce the same kind of plot in a different "graphing" or "charting" tool (e.g. Excel, matplotlib, etc.) if the data was formatted in such a way those tools could digest. Have you considered this?
As things are coded now, all the functionality is packaged into the extrude operator. Do you see any big issues if the multi-var aspect of this was instead handled in a separate, Stacked Extrude, operator?
You have two coloring/labeling options (Index and Value). The index colors discretely by a category (you used "task1" and "task2" I think) whereas the value coloring is supposed to color continuously by a variable's value. Do you happen to have a more elaborate example of that other than the image captured in the issue?

ARSanderson commented 4 years ago

For task based processes under 100, I have seen up to 50 tasks so far. So I do not think we would run into an issue - but agree this issue is the same for both when there is too much data.
Each performance data value is associated with two meshes that are in different visual domains (machine and communication). It would be possible to move the functionality into the database and thus create a line segment and assemble the values. But then a segment would be needed for each rank. That is if I am running on 1000 ranks with 10 tasks, then a 1000 line segments each with 11 points would need to be passed. The extrude does that for me by sending it 1000 points.

However, it is not the case that one would want to always see all of the variables at the same time perhaps only a subset. So an interface is needed.

Using an secondary tool is not helpful because there is an inherent need to look at the multiple simultaneous views.

At this point the extrude filter could be made into a separate filter that would be called by the current extrude operator and the "new" operator without any problem.
I could generate some additional images if needed that show both index vs value.

Note: the following is already in the extrude operator as a feature (since 3.0):

Enhancement to extrude operator to support extrusion by a mesh variable instead of a constant

@biagas I would agree that as an operator it should not have the word plot in it. Stacked variables comes to mind.

As for coloring by value - too be honest I am not sure how useful that is overall because typically the goal of a stacked plot is to be able to see values relative to each other.

As such, my first line of thinking would be an operator that could feed into the subset plot.

markcmiller86 commented 4 years ago

@ARSanderson sorry for delays in responding...

For task based processes under 100, I have seen up to 50 tasks so far. So I do not think we would run into an issue - but agree this issue is the same for both when there is too much data.

So, this issue was discussed by the team and we reached a similar conclusion. That said, the reason for the question about scale was whether the particular approach really lends itself to the kinds of large scale parallelism that tends to be VisIt's wheelhouse? If not, it is harder to make arguments to explicit enhancements to handle this case as opposed to workable but maybe slightly less convenient alternatives.

Each performance data value is associated with two meshes that are in different visual domains (machine and communication). It would be possible to move the functionality into the database and thus create a line segment and assemble the values. But then a segment would be needed for each rank. That is if I am running on 1000 ranks with 10 tasks, then a 1000 line segments each with 11 points would need to be passed.

So, honestly, 1,000 ranks 10 tasks makes for 10,000 line segments which I think kinda sorta adds to* the idea of just handling this particular plotting idea by augmenting the data producer to create a suitable database instead of an approach requiring changes to VisIt internals or interfaces.

However, it is not the case that one would want to always see all of the variables at the same time perhaps only a subset. So an interface is needed.

Well, I think we can support subsetting if the data producer does the job of defining the relevant subsets in the database employing the baked-in notions of either domains, blocks or materials or perhaps with enumerated scalars.

So, all this really suggests to me that the better approach is to design and construct a database suitable of supporting the visualization(s) you want using existing functionality in VisIt.

If this is not clear, we can set up some time to WebEx to discuss what the database should look like to satisfy your needs. Let us know if this doesn't seem like a reasonable course of action.

ARSanderson commented 4 years ago

Hi Mark,

I disagree with the finial analysis. Say I have 5 values, the database produces a line segment ABCDE for each rank. Each letter corresponds to some segment. The user wants to see A, C and E only, so the sub setting is going to remove B and D and what is going to be displayed is “A C E” which is not what the user will want to see. That is there will be gaps between the segments, They want “ACE” a contiguous line with three segments. That is the subset can only remove geometry, it can not move geometry which is what is needed.

Cheers,

Allen

On Apr 16, 2020, at 7:15 PM, Mark C. Miller notifications@github.com wrote:

@ARSanderson https://github.com/ARSanderson sorry for delays in responding...

For task based processes under 100, I have seen up to 50 tasks so far. So I do not think we would run into an issue - but agree this issue is the same for both when there is too much data. So, this issue was discussed by the team and we reached a similar conclusion. That said, the reason for the question about scale was whether the particular approach really lends itself to the kinds of large scale parallelism that tends to be VisIt's wheelhouse? If not, it is harder to make arguments to explicit enhancements to handle this case as opposed to workable but maybe slightly less convenient alternatives.

Each performance data value is associated with two meshes that are in different visual domains (machine and communication). It would be possible to move the functionality into the database and thus create a line segment and assemble the values. But then a segment would be needed for each rank. That is if I am running on 1000 ranks with 10 tasks, then a 1000 line segments each with 11 points would need to be passed. So, honestly, 1,000 ranks * 10 tasks makes for 10,000 line segments which I think kinda sorta adds to the idea of just handling this particular plotting idea by augmenting the data producer to create a suitable database instead of an approach requiring changes to VisIt internals or interfaces.

However, it is not the case that one would want to always see all of the variables at the same time perhaps only a subset. So an interface is needed.

Well, I think we can support subsetting if the data producer does the job of defining the relevant subsets in the database employing the baked-in notions of either domains, blocks or materials or perhaps with enumerated scalars.

So, all this really suggests to me that the better approach is to design and construct a database suitable of supporting the visualization(s) you want using existing functionality in VisIt.

If this is not clear, we can set up some time to WebEx to discuss what the database should look like to satisfy your needs. Let us know if this doesn't seem like a reasonable course of action.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/visit-dav/visit/issues/4540#issuecomment-614999702, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGHBOIIYO6INM6FWLWHI65DRM63UTANCNFSM4LPTLKVQ.

markcmiller86 commented 4 years ago

That is there will be gaps between the segments

Yes, I see that now. But, I am not deterred in the belief that there is a better way to support this and am still working on some of those alternatives. As we may have considered in a conversation in the distant past, I think this is a minor swizzle on a Vector plot and am close to proving that to myself without any modification to VisIt.

markcmiller86 commented 4 years ago

I am attaching here a picture with basically the same idea which required no-change in VisIt but was a bit kludgy to construct. I am plotting two quantities over a 2D mesh using a Vector plot combined with a Displace operater on the second quantity only to push it out to the ends of the first quantity's vectors.

I am not saying this is a solution. But, I am saying minor changes to Vector plot and Displace operator make this plot possible. These changes are...

When scaling vectors, have option to independently scale length and cylindar width.
When using Displace operator, have option to stack instead of just scale the displacement.

Again, I fully acknowledge in its present form, this is a bit kludgy. But, I also think it represents better use of existing functionality and extensions that have more potential for applicability beyond this single use case.

Untitled

markcmiller86 commented 4 years ago

Crap...should have attached the session file for this plot and just exited VisIt. Sorry for not thinking.

ARSanderson commented 4 years ago

Hi Mark,

I see where you are going with this, the question is how does it scale to multiple variables? With two variables it is easy (?) but what about with 3 or more? At the end of the day, one really needs to balance reuse vs new and ease of usage to get what is needed. Personally, I see a clean operator interface to select variables to “stack” how it is actually gets done under the hood does not matter.

Cheers,

Allen

On Apr 17, 2020, at 12:51 PM, Mark C. Miller notifications@github.com wrote:

I am attaching here a picture with basically the same idea which required no-change in VisIt but was a bit kludgy to construct. I am plotting two quantities over a 2D mesh using a Vector plot combined with a Displace operater on the second quantity only to push it out to the ends of the first quantity's vectors.

I am not saying this is a solution. But, I am saying minor changes to Vector plot and Displace operator make this plot possible. These changes are...

When scaling vectors, have option to independently scale length and cylindar width. When using Displace operator, have option to stack instead of just scale the displacement. Again, I fully acknowledge in its present form, this is a bit kludgy. But, I also think it represents better use of existing functionality and extensions that have more potential for applicability beyond this single use case.

https://user-images.githubusercontent.com/5720676/79608157-54dfc180-80a9-11ea-833a-92be7cb09aa1.png — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/visit-dav/visit/issues/4540#issuecomment-615434862, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGHBOIIWPIHY6PMZHK6S7SLRNCXMVANCNFSM4LPTLKVQ.

markcmiller86 commented 4 years ago

@biagas, @cyrush and @brugger1 ... do you have any thoughts on these most recent revelations?

ARSanderson commented 1 year ago

I am looking through my unfinished VisIt tasks and the notion of stacked operator comes up. Though the topic was last discussed sometime ago I wanted to resurrect it so to make a final decision on what do. Perhaps the lack of further discussion is the decision. But I am about to embark on using it again on Aurora.

My suggestion so not to lose the effort would be take what I have done and turn it into "Stacked Operator" and release it into the wild and see what users come back with.

biagas commented 1 year ago

@ARSanderson we discussed this in our project meeting. Yes, please, go ahead with adding a new operator. Please ensure there is documentation describing how to use (and perhaps for which scenarios), and also add tests. Thanks!

ARSanderson commented 1 year ago

Though I have been calling this operator a stacked plot . A new name is required as it is an operator not a plot. Also "Stacked plot" has specific meaning in MathLab that is different. The correct name in 2D would be a "Stacked bar chart" or in 3D "Stacked column chart."

Given there is already an extrusion operator and the the stacking an offshoot. I am proposing that is be named a "Stacked Extrusion" operator. Those that know the extrusion operator will already understand that operation while the stacking is adding addition data on top.

Thoughts?

brugger1 commented 1 year ago

Sounds like a good name to me!

ARSanderson commented 1 year ago

Now part of https://github.com/visit-dav/visit/pull/18914

visit-dav / visit

Extrude using multiple variables - aka stacked plots #4540