zerothi / sisl

Electronic structure Python package for post analysis and large scale tight-binding DFT/NEGF calculations
https://zerothi.github.io/sisl
Mozilla Public License 2.0
173 stars 57 forks source link

WIP: Facilitate the creation of combined plots #777

Open pfebrer opened 1 month ago

pfebrer commented 1 month ago

Since #340, there was the discussion of how to make the API for multiple plots as convenient as possible. This PR attempts to solve it.

In nodify, I introduced the concept of a Batch. If you pass a Batch as an input of a workflow, the workflow will also return a Batch. The most important thing is that only the parts of the workflow that depend on the batch are computed for each item, the rest are shared, saving time and memory.

This is automatically applicable to the plots, since they are workflows:

from nodify import Batch
import sisl

graphene = sisl.geom.graphene()

# Create a batch
scales = Batch(1, 2, 3)
# Then call the plotting function, but passing the batch as an input on several arguments
g_plot = graphene.plot(axes="xy", show_cell=False, atoms_scale=scales, bonds_scale=scales)

g_plot.get() will now return a batch, which contains the three figures.

Now the question is what is the best interface for sisl users. In this PR I have changed slightly the functions that merge plots so that they can support batches. Now one can do for example:

sisl.viz.subplots(g_plot)

newplot (71)

One can also change the way in which batches interact with each other. This is defined by the node's context. In this case they were zipped, but one could ask to take the product as well:

from nodify import temporal_context

with temporal_context(batch_iter="product"):
    sisl.viz.subplots(g_plot).show()

newplot (72)

I have decided that the user must use the merging functions, and g_plot.show() does not show anything, since there is no indication of how the user wants to merge the plots. Also if we were to automatically merge them it would not work in cases like:

# Create a batch
backends = Batch("plotly", "matplotlib")

g_plot =  graphene.plot(axes="xy", backend=backends)

Where the backends are different. In that case one can do:

for plot in g_plot.get():
    plot.show()

Screenshot from 2024-05-25 18-10-12

Do you think this is fine? Is there something that you would change to make it more convenient to use?

Thanks!

pfebrer commented 1 month ago

When we agree, I will change the docs that show how to combine plots.

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 0% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 87.29%. Comparing base (9c0317d) to head (ce82efb).

Files Patch % Lines
src/sisl/viz/plots/merged.py 0.00% 8 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #777 +/- ## ========================================== - Coverage 87.30% 87.29% -0.02% ========================================== Files 394 394 Lines 50409 50417 +8 ========================================== Hits 44011 44011 - Misses 6398 6406 +8 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

zerothi commented 1 month ago

I think most people will use grouped plots to combine different types of plots (i.e. not just a variable).

For instance, geometry plots + PDOS scaling of atomic sizes, vs. a fat-bands-plot to the right. Would the Batch method aid in this? I can however, see that the batch is a nice way to figure out a parameter size.

I think the batch.show() should indicate an error of some sorts, to signal that it can't accomblish it, possibly advocating the use of is_batch_plot and sisl.viz.subplots or similar.

Is there an easy way to query whether a plot is a Batch plot (by not isinstance(plot, list)), at least there should be something like is_batch_plot (or similar), just to make it easy to distinguish, if that is a wrapper for isinstance(plot, list) is fine.

pfebrer commented 1 month ago

I think most people will use grouped plots to combine different types of plots (i.e. not just a variable).

For instance, geometry plots + PDOS scaling of atomic sizes, vs. a fat-bands-plot to the right. Would the Batch method aid in this? I can however, see that the batch is a nice way to figure out a parameter size.

Yes, makes sense. In that case batches are not the right tool. You just build a list of plots and merge them (as it is done until now).

Batches just help in getting the result of a workflow for multiple inputs while sharing the common parts of the computation. I.e. they don't help if you want to execute different workflows.

There are some typical cases where one might want to batch an input. For example, plotting multiple wavefunctions:

from nodify import Batch

H.plot.wavefunction(i=Batch(4, 5, 6, 9))

will create plots for the 4 wavefunctions doing one single diagonalization.

I think the batch.show() should indicate an error of some sorts, to signal that it can't accomblish it, possibly advocating the use of is_batch_plot and sisl.viz.subplots or similar. Is there an easy way to query whether a plot is a Batch plot (by not isinstance(plot, list)), at least there should be something like is_batch_plot (or similar), just to make it easy to distinguish, if that is a wrapper for isinstance(plot, list) is fine.

Makes sense. Something like plot.is_batched could work?.

pfebrer commented 1 month ago

Another example:

from nodify import Batch, temporal_context

with temporal_context(batch_iter="product"):
    H.plot.wavefunction(i=Batch(4, 5, 6, 9), represent=Batch("real", "imag"))

would generate 8 plots with 1 diagonalization and 4 grid projections.

zerothi commented 1 month ago

Another example:

from nodify import Batch, temporal_context

with temporal_context(batch_iter="product"):
    H.plot.wavefunction(i=Batch(4, 5, 6, 9), represent=Batch("real", "imag"))

would generate 8 plots with 1 diagonalization and 4 grid projections.

what does temporal_context mean? Isn't temporal implicit in contexts? I would think you should name this differently, nodify_context... or?

Also, could one create product batches and zip batches. I.e.

batch1 = Batch(1,2)
batch2 = Batch(2,3)
batch3 = Batch(*range(4))
with nodify_context(batch_iter={(batch1, batch2): "product", "*": "zip"})

probably bad names here, something that binds them together?

pfebrer commented 1 month ago

what does temporal_context mean? Isn't temporal implicit in contexts? I would think you should name this differently, nodify_context... or?

Hmm interesting, so context here is referring to the nodes context, which determines how the computation is performed. Each node has a context attached to it and that is permanently bound to the node (although you can update it). But still I agree that probably temporal is not needed here, since the with statement already indicates "temporality" as you say. Maybe just context would suffice:

from nodify import context

with context(batch_iter="product"):
    ...

#or
import nodify

with nodify.context(batch_iter="product"):
    ...

Also, could one create product batches and zip batches. I.e.

batch1 = Batch(1,2) batch2 = Batch(2,3) batch3 = Batch(range(4)) with nodify_context(batch_iter={(batch1, batch2): "product", "": "zip"}) probably bad names here, something that binds them together?

I don't want to complicate things for now. I always keep in mind that these things should be modifiable from the GUI :sweat_smile:

I thought about it but then I realised that programatically one could always create complicated combinations and then zip the batches. E.g.:

batch1 = Batch(1,1,2, 2)
batch2 = Batch(2,3,2,3)
batch3 = Batch(*range(4))

# Now when batches are zipped it will result in the same situation as what you were proposing
zerothi commented 4 weeks ago

I don't see the final comment in your post? Maybe it got truncated, or something?

pfebrer commented 3 weeks ago

You mean after the comment?

No I didn't write anything, it's just that if you zip those three batches you end up with the same as what you were proposing.