Next-gen AiiDA provenance explorer

materialscloud-org / issues

An issue tracker for Materials Cloud

3 stars 1 forks source link

Next-gen AiiDA provenance explorer #33

Open eimrek opened 3 months ago

eimrek commented 3 months ago

The current AiiDA explorer (https://github.com/materialscloud-org/aiida-explorer) is built in a deprecated javacript framework (AngularJS) and has areas where it could be improved.

Therefore, there is a strong need for a next generation AiiDA explorer written in a modern Javascript framework. This issue is to discuss this and to collect feedback and suggestions.

There is currently also a possible Google Summer of Code project related to this: https://github.com/aiidateam/aiida-core/wiki/GSoC-2024-Projects. Some of the information given there will be duplicated here.

Here's an initial list of points to keep in mind and potential improvements over the current implemention:

The primary use case of the new tool is to be used in the Materials Cloud Explore section to explore the AiiDA databases 1) running on the Materials Cloud servers; but also 2) user's local AiiDA databases. This is possible with the current tool. But the current tool is a bit too tightly integrated with Materials Cloud. The new tool could be independent in a way that you could easily just start it locally as well and it could be embedded in different websites (e.g. it might also be useful in AiiDAlab).
The javascript framework should ideally be React, as that integrates well with the rest of the new Materials Cloud implementation.
For visualizing the node graph, we could use an open-source library (e.g. Rete, react-flow or similar).
The new tools should use the new AiiDA restapi rather than the old one built into aiida-core (see here and here), which the current tool uses. There are plans to eventually deprecate the old rest api.
Some improvements over the old node graph visualizer (see example)
- it's not easy to distinguish input and output nodes. One solution here is to keep inputs on one side, and outputs on other side of the selected node;
- the graph browser should visualize all the connecting nodes (the current maximum number shown is 10);
- when the user selects a new node, the page redirects to a new page, thus losing the smooth transition from one node to another. The new implementation should just update all the UI components when selecting a new node in the graph browser, instead of redirecting to a new page.
One point of decision w.r.t. the node browser is whether to only show a single node and its inputs & outputs (like in the current implementation) or to show the whole graph that can be panned/zoomed. @sphuber briefly mentioned about this idea here. For the "global" graph, it might not be trivial on how to organize the placement of the nodes, so this requires some thought.
The default view of the current tool is the grid view (see e.g. https://www.materialscloud.org/explore/mc3d/). This could be potentially improved e.g. by offering a search functionality.
Node data view: raw vs rich. The current tool shows "rich views" of some specific node types (e.g. StructureData or CalcJobNode), while others show just a basic dictionary (e.g. Code). One possible idea here is to have a "raw view" that works for all node types and then some (e.g. the StructureData) will additionally a "rich view". (Effectively also enable "raw view" for the nodes that currently only have a "rich view").

Feedback & further ideas/suggestions welcome.

Pinging @giovannipizzi @superstar54 @sphuber @unkcpz @mbercx

unkcpz commented 3 months ago

Pinging @ltalirz, he is kindly offer helps for the GSoC and I believe he can give inspiring opinion on this topic.

giovannipizzi commented 3 months ago

Great! I think this summaries most of the points I had in mind. For the design of the graphical ui, as soon as we have a person ready to implement this (is there someone who's interested already?) we should just sit down in a meeting and "draw" on a board some concepts of what each of us has in mind, ideally converging to a ui design that is simple but with clear design suggestions. We can already actually meet now even before having a person, and report back in a formalised way how the ui should look like and the reason behind choices, but we need to have at least one person skilled in react/web page design involved, to ensure we do not design something that is very hard to implement (and that we can take inspiration from what existing libraries already provide: ideally, if we can just reuse an existing library rather than code the graph visualization from scratch, that would be fantastic)

giovannipizzi commented 3 months ago

As a technical comment (that can be tested already now) it would be great to check the performance of using the rest api (for now the current one on MC, for instance) when working directly on a sqlite_zip archive. Anyway at the moment on MC archives are read only, using directly the zip file would simplify a lot maintenance of the server without rmq, psql,... But there might be performance improvements needed. Eg (not sure this happens now) it would be good to expand the internal sqlite db once and cache it as long as the profile is used, and not re-unzip it every time the profile is loaded. Not sure what is happening now, but it would be good to check (@eimrek I mentioned this also to Ali and Julian, you can ping them for support in testing)

sphuber commented 3 months ago

For the design of the graphical ui, as soon as we have a person ready to implement this (is there someone who's interested already?)

I had been experimenting myself for a while already with this subject. Using the functionality I propose in this AEP and as implemented in this PR, I think the aiida-restapi implementation may be significantly improved/simplified since it no longer needs to duplicate the schemas of the various ORM entities.

I had also already started with a React application to consume the API just to test things and to get some experience with react, of which I currently have very little. I think a complete GUI application to browse the provenance graph, but also interact with AiiDA in general will be very very powerful, so I'd be happy to contribute to this project. If we can get a skilled React developer, I think we can make this happen.

mbercx commented 3 months ago

Thanks @eimrek! Also pinging @edan-bainglass since we talked about this recently and he has a passion for GUIs. :)

A couple of quick thoughts here:

What would the default "whole"/"global" graph of a selected node be? Perhaps the smallest graph that provides consistency according to the traversal rules for exporting data (similar to verdi node graph generate)? And we add the options for adapting these rules to the GUI?
One other idea I remember discussing was to allow the graph to be expanded (e.g. right click on a node and click "expand" to add linked nodes to the "global" graph. This might be useful, but I can already think of issues when e.g. applying such a selection to aUpfData node. ^^
Often the full provenance graph of a workflow is quite complicated, making it hard to figure out the performed steps. It's already possible to restrict the graph to only the "data" or "logical" provenance, but I was wondering if we could think of only showing the "process" provenance, where we only show Process nodes and connect them in case one process has an input that is an output of another. This idea also came up in a discussion recently with a user that found it difficult to understand the workflow based on the provenance graph.

EDIT: Side note: maybe we can also post a question on the Discourse to see if we can get any user input on what they would like to see in the next-gen provenance explorer.

unkcpz commented 3 months ago

Also what to mention with anywidget the explorer can be integrated as the widget and used in AiiDAlab.

But from GSoC point of view, I think the overall goal describe in this issue is a bit too much for a student working on 350 h. We can get more inputs and feedback on the goals, but let's try to pick and set tangible milestones for GSoC. I think for the restapi part should independent of GSoC to not overwhelm student.

sphuber commented 3 months ago

I think for the restapi part should independent of GSoC to not overwhelm student.

I agree. This will anyway be easier for us to implement and this way the student can focus on the front-end application. It is probably easier to find a candidate that is strong in JSX/react than one that also has Python skills. So we can just make sure that we provide a web API that has the endpoints they need to extract the data needed

ltalirz commented 3 months ago

I just want to expand on some of the great points Marnik made.

In my experience, the current provenance graph browser is useful for visualizing the information of certain node types and for hopping from one node that you are pointed to (e.g. by a hyperlink) to an adjacent node, such as the input or output of a calculation. This is used in a number of discover sections, starting with the SSSP, and a very useful feature.

However, the current provenance browser is not suitable as tool for exploring the execution of a workflow that you are not familiar with - you basically already need to know where to look. To put it another way: if I give you a deep link to some node in a complex workflow and then ask you to draw the hierarchy of the global workflow that contains it, you are going to have a very hard time figuring it out.

In order to make AiiDA graphs explorable, it is necessary to introduce hierarchy in the views of the workflow - for example by defaulting to a high-level view that only shows the top-level workflow, its inputs and outputs, but nothing about what goes on inside it. Then, a user could click 'unfold' on the workflow icon to expand the next level of the hierarchy.

In my view, having such a feature would increase the value of AiiDA provenance graphs tremendously, because it can transform the graph from a record that is useful mainly to the person running the calculation, to a record that is useful for everyone else.

The lowest hanging fruit is to implement such a functionality in the static verdi graph generate and have the user specify the necessary information for the unfolding via cli options [1]. However, if something like this could be implemented in the interactive Javascript viewer, it could of course be made even more intuitive and impactful.

[1] Some thoughts in https://github.com/aiidateam/aiida-core/issues/2282, copied main part below

The logic for collapsing a given workflow node is as follows:

start traversing the descendants of the Workflow Node until you hit a node that is directly returned by the workflow

all nodes visited this way, except for the final returned nodes are marked to be hidden and will not appear in the final graph

superstar54 commented 3 months ago

Here is a summary of the popular node graph libraries: https://github.com/xyflow/awesome-node-based-uis

These libraries' examples could serve as both a foundation and inspiration for developing a new provenance explorer. It's important to note that the choice of library will significantly influence the implementation details of this explorer tool.

I used rete.js in the GUI AiiDA-WorkTree for the node graph viewer and editor. It has the following features:

only show a Process as a node, and with the data inputs/outputs on each side.
one can edit the node graph (create a new node, link node) interactively
has a mini-map, thus making it easy to search a node in a large node graph
custom node appearance.
can create a node graph from a JSON file
can handle events when users select a node
can integrate with the react component.

These features fulfill the need for the AiiDA-WorkTree, especially the first two features. Treating the Process as a node is important in the worktree because the worktree focuses more on the logic of the provenance. In the long term, we also allow the user to edit the node graph directly to create their workflow.

However, I also see the difference between a node graph editor and a node graph explorer. An editor focuses more on data flow and precise control, while an explorer focuses more on appearance and interactivity. In principle, an explorer should work on all platforms (iPad, phone, etc) with good performance for thousands of nodes. In the case of a very large node graph, it also should support the level of detail (LOD), even with GPU support in the case of a large AiiDA database.

superstar54 commented 3 months ago

I strongly support @ltalirz suggestion that introducing a hierarchical structure to the workflow views is essential for enhancing the explorability of AiiDA graphs. A practical approach to achieving this is through the utilization of node group. This concept, widely adopted in node graph systems such as Unity and Blender, has proven to be effective in organizing and managing complex structures. I also implemented this strategy in the WorkTree GUI.

eimrek commented 3 months ago

Thanks everyone for great ideas!

So, just to summarize a bit:

The new REST API probably requires some testing, functionality and performance wise. I'll try to figure out if the features needed for just browsing the graph are present. The docs do seem to report some necessary parts as "NOT IMPLEMENTED" (e.g. downloading files, listing process input files, ...)

@sphuber mentioned that it would be good to have a "general" GUI application to 1) browse the provenance; but also to 2) interact with AiiDA (i'm assuming to start workflows, etc). But I'm thinking for the MC explore, at least initially, it probably makes sense to just start with 1). Maybe in the future, some parts of this implementation could be reused for the "general GUI tool". What do you think?

As @mbercx and @ltalirz described, it would be great for the node browser/visualizer to accomplish two goals: 1) easily visualizing inputs and outputs of a selected node (as is done in the current implementation); but also to 2) understand the workflow in general; and to see where the currently selected node is located in the "global" provenance graph.

I am wondering if 1) and 2) should be merged together in the same visualization solution (e.g. via folding/unfolding, selectively hiding non-process nodes, or similar), or perhaps they should be separated somehow. We should have a brainstorming meeting on how an ideal solution could look like (e.g. somehow with @superstar54's node group or something else.)

SharanRP commented 3 months ago

Based on my understanding, I propose that combining points 1 and 2 would be beneficial. By highlighting the selected node on a mini-map and clearly distinguishing it from the input and output nodes, users can better comprehend the graph. Additionally, providing a legend would further enhance user understanding.

edan-bainglass commented 3 months ago

As @mbercx mentioned, I am quite interested in graphical representations of data and their impact on tool usability. However, I am also on vacation, so not much time to dedicate to the discussion. That said, let me record a few thoughts down.

I strongly agree with @ltalirz graph depth control. I brought up a similar idea in a brief discussion on the matter with @giovannipizzi last year. My only concern is performance at deeper levels - limit node count?

Regarding some of the initial comments by @eimrek:

it's not easy to distinguish input and output nodes. One solution here is to keep inputs on one side, and outputs on other side of the selected node

Color/shape code might help, but it seems reasonable/feasible enough to section them apart.

the graph browser should visualize all the connecting nodes (the current maximum number shown is 10)

I think here the comments of @ltalirz are useful. As you "zoom" out, certain parts of the graph should gradually reduce from most detail to least detail (node-count/arbitrary-distance-dependent check points) or entirely vanish, eventually leaving behind only the so-called top-view (likely WorkChain nodes, or maybe even a new cartoonish representation).

when the user selects a new node, the page redirects to a new page, thus losing the smooth transition from one node to another. The new implementation should just update all the UI components when selecting a new node in the graph browser, instead of redirecting to a new page.

Here I had something more dynamic in mind.

https://github.com/materialscloud-org/issues/assets/45081142/56cfcb93-d5f5-432b-94e1-db151dec7563

One point of decision w.r.t. the node browser is whether to only show a single node and its inputs & outputs (like in the current implementation) or to show the whole graph that can be panned/zoomed. @sphuber briefly mentioned about this idea https://github.com/aiidateam/aiida-core/pull/6276#issuecomment-1993816678. For the "global" graph, it might not be trivial on how to organize the placement of the nodes, so this requires some thought.

From my comments above, I think this is doable, as long as certain rules are placed at each level. Performance is key to ensure a smooth user experience, which is an absolute MUST!

eimrek commented 3 months ago

Thanks for the great ideas @edan-bainglass!

I want to propose here a potential initial version for the next gen browser. This could act as a starting point, which can be extended later for more complex behavior. And also it could act as inspiration for an achievable GSoC project.

Here's what I have in mind:

(Here's also the figma link, let me know if you want to edit access to modify/propose your own ideas.)

So this version is similar to the current version: we have 1) the node preview on the left; and 2) the graph browser on the right.

The raw node preview could just directly display some data in JSON format obtained from the rest api. The current version does something similar, but formats the data in a way that is not easy to copy/paste. Any ideas/suggestions here, how to make this as useable as possible?

The node browser proposed here only shows the 1st layer of inputs and outputs. The advantage over the old "circular" layout is that 1) it allows to render any number of inputs/outputs as opposed to current implementation (panning might be needed); 2) it's easy to understand what are inputs and outputs. This design doesn't address the point of how to visualize the more higher level of provenance (addressed by @mbercx, @ltalirz, ...) but potentially that could be a possible extension, e.g. with zooming, some extra buttons to switch between hierarchies or else. Does anybody see issues with this design that would not allow for the future extensions we have in mind?