pm4py / pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.
https://pm4py.fit.fraunhofer.de
GNU General Public License v3.0
722 stars 286 forks source link

Dotted Chart Issue - Unable to view output when using a large number of traces #330

Closed AdamBanham closed 2 years ago

AdamBanham commented 2 years ago

Hi,

I tried running the following snippet in a Jupyter notebook and noticed that the output from neato (outside of pm4py) was not viewable.

I am using the BPIC 2012 log, with 13087 traces. However, once finished, neato is unable to create a viewable png (perhaps because it is too large).

import pm4py
from os.path import join

BPIC_LOG = join(".","BPI_Challenge_2012.xes.gz")
log = xes_importer.apply(BPIC_LOG)
log._list = log._list[:50]
pm4py.view_dotted_chart(log , format="png")

image

Reducing the log down to 50 traces I can view a chart, but it is a bit hard to view (in juypter), is there any chance that this function could be moved to a Matplotlib interface or implementation? Where legend placement, file format for saving and sizing could be customised by the user.

AdamBanham commented 2 years ago

With a bit of messing around, something like below is possible. Where a Matplotlib.Figure is returned, allowing for some user customisation if needed, e.g. dpi settings, labels or titles.

from pm4py.objects.log.importer.xes import importer as xes_importer
from pm4py.objects.log.obj import EventLog,Trace

import matplotlib.pyplot as plt
from matplotlib.figure import Figure

from typing import List, Tuple
from os.path import join

BPIC_LOG = join(".","BPI_Challenge_2012.xes.gz")

TIME_ATTR = "time:timestamp"

def get_log() -> EventLog:
    return xes_importer.apply(BPIC_LOG)

def convert_trace(trace:Trace, startingTime:float) -> Tuple[List[float]]:
    timepoints = [] 
    for event in trace:
        timepoints.append(event["time:timestamp"].timestamp() - startingTime)
    return timepoints

def convert_log(log:EventLog) -> List[List[float]]:
    log_sequences = []
    startingTime = log[0][0]["time:timestamp"].timestamp()
    for trace in log:
        log_sequences.append(convert_trace(trace,startingTime))
    return log_sequences

def find_scale(seconds:float) -> Tuple[str,float]:
    if seconds < (60 * 3):
        return ("min" , 60)
    elif seconds < (  60 * 60 * 20):
        return ("hr", ( 60 * 60))
    elif seconds < (  60 * 60 * 24 * 100):
        return ("d", ( 60 * 60 * 24))
    else: 
        return ("yr", ( 60 * 60 * 24 * 365))

def matplotlib_dotted_chart(log:EventLog,dpi=300,figsize=(10,10)) -> Figure:
    fig = plt.figure(figsize=figsize,dpi=dpi)
    ax = fig.subplots(1,1)
    colormap = plt.cm.get_cmap("Accent")
    sequences = convert_log(log)
    for y,sequence in enumerate(sequences):
        color = colormap(y % len(colormap.colors) / len(colormap.colors))
        ax.plot(
            sequence,
            [ y for _ in  range(len(sequence)) ],
            "o",
            color=color,
            markerfacecolor="None",
            markersize = 1,
        )
    #clean up plot
    ax.set_ylim([0,y])
    min_x = min([min(s) for s in sequences])
    max_x = max([max(s)for s in sequences])
    ax.set_xlim([min_x, max_x ])
    ax.set_yticks([])
    # add suitable xticks 
    diff_x = max_x - min_x 
    tickers = [ min_x] + \
                   [ min_x + (portion/100) * diff_x 
                    for portion in range(10,100,10) ] + \
                   [ max_x ]
    suffix, scale = find_scale(diff_x)
    ax.set_xticks(
        tickers
    )
    ax.set_xticklabels(
        [ 
            f"{(tick - min_x) / scale:.2f}{suffix}"
            for tick 
            in ax.get_xticks()
        ],
        rotation=-90
    )    
    #add labels
    ax.set_ylabel("Trace")
    ax.set_xlabel("Time")
    ax.set_title(f"Dotted Chart of\n {log.attributes['concept:name']}")
    ax.grid(True,color="grey",alpha=0.33)

    return fig

def run():
    log = get_log()
    fig = matplotlib_dotted_chart(log,dpi=300,figsize=(10,10))
    fig.tight_layout()
    fig.savefig("demo.png")

if __name__ == "__main__":
    run()

demo

Javert899 commented 2 years ago

Dear Adam,

Thanks for signaling. We could consider it as "new" dotted chart visualizer in future (because it's completely different from the Neato version currently available).

Have a nice day

fit-alessandro-berti commented 2 years ago

Dear Adam, after reviewing the existing code base, the problem with "neato" was the automatic layout, which was acting even if in the .dot file all the coordinates of the nodes were provided. In the next release of PM4Py, the dotted chart and performance spectrum will have a significantly increase in their performance, due to the removal of this automatic layouting.

AdamBanham commented 2 years ago

Thanks for the quick update,

Alongside these performance improvements, does the dotted chart function produce a viewable chart, though? In particular, for the BPIC2012 log?

Cheers

fit-alessandro-berti commented 2 years ago

Apart from the enormous amount of points, yes

image
fit-alessandro-berti commented 2 years ago

Take into account using PMTk https://pmtk.fit.fraunhofer.de/ which implements the dotted chart with sampling and GPU acceleration if you need to visualize huge logs

AdamBanham commented 2 years ago

If I were to use PMTk, are there any issues with research ethics or copyright? My understanding is that it is a closed sourced implementation from FIT. Although the idea of GPU acceleration and sampling does sound interesting and useful, would I be able to export the visualisation, or would I need to screen capture from the application?

I was hoping for customisability over efficiency for my use case, which may not be useful for a general implementation.

I am hoping to make some reactive gif/animations for events log and be a bit over the top. For example, the animation below (with plans to add more), working with just matplotlib in python (took 15 minutes to render, but nonetheless). 003_running_dotted_demo

However, it seems that the issue is resolved with the current implementation. Should I close the issue?

fit-alessandro-berti commented 2 years ago

In the current version of PMTk, you have the possibility to export the dotted chart representation as SVG/PNG. Still, it is not an animation like the one that you show in your post.

AdamBanham commented 2 years ago

I checked out the visualisation in PMtk, it does look very nice, a big improvement from the neato version imo.

But it seems like the initial reason for the issue will be resolved in the next version. So I will close the issue.

However, if you wanted some native Matplotlib visualisations, I would be happy to help.