rgerum / pylustrator

Visualisations of data are at the core of every publication of scientific research results. They have to be as clear as possible to facilitate the communication of research. As data can have different formats and shapes, the visualisations often have to be adapted to reflect the data as well as possible. We developed Pylustrator, an interface to directly edit python generated matplotlib graphs to finalize them for publication. Therefore, subplots can be resized and dragged around by the mouse, text and annotations can be added. The changes can be saved to the initial plot file as python code.
GNU General Public License v3.0
707 stars 37 forks source link

add support for writing output to different file #31

Open JacksonBurns opened 3 years ago

JacksonBurns commented 3 years ago

Hello, and thanks for the great tool!

I am working on an package which generates plots for the user, and as implemented, pylustrator would be editing the source code of my package when save... is used. This won't be useful when this is distributed on PyPi, as the edits will be buried in an environment folder and not runnable due to the structure of the package.

This PR contains a proof of concept for having pylustrator create a separate and standalone file which can be run to regenerate the plot separate from the source code.

Run temp.py, click save... in the pylustrator UI, and then you should be able to run thisisatest.py to regenerate the plot.

Thoughts on building up this implementation or continuing at all? I have some doubts as to whether or not this would work on plots which are not line plots, but this might still be workable.

rgerum commented 3 years ago

hmm, this seems interesting to be able to save the whole plot to a new and clean python file. Here it might even be interesting to then create a minimal python file that tries to condense all descriptions. So there e.g. the line width could be edited directly into the plt.plot function and does not need to be in a separate pylustaror generated code part, as all is pylustrator generated. But the problem here is, I think, that it would need to support all or at least a great deal of the matplotlib features. And I think some like plt.errorbar does not create matplotlib artists which still know about the data. So pylustrator would need here to inject some tracing code into the plt.errorbar function.

JacksonBurns commented 3 years ago

So there e.g. the line width could be edited directly into the plt.plot function and does not need to be in a separate pylustaror generated code part, as all is pylustrator generated.

I actually think the easiest way forward would be to leave all plot customization in the pylustrator -- it already does an excellent job tracking plot changes, so I don't think we need to reinvent the wheel. The ideal workflow would look like this:

  1. User writes code to generate the X and y data that they want to plot.
  2. User does no formatting of any kind and only calls plt.plot(X,y).
  3. In the pylustrator window they make all changes of interest.
  4. The exported code would then create a standalone file.

But the problem here is, I think, that it would need to support all or at least a great deal of the matplotlib features.

Agreed that this would be difficult. If the above method were to work, though, all we would need to do is get the "data" our of all the conceivable plot types. For this simple example, it is pretty much fully functional. For examples like you mentioned with error bars though, I am not sure how to go about it since they don't seem to leave any easily accessible record of the data plotted.

Alternate approach -- with my proposed layout, the user is presumably importing a single library (like seaborn, etc.) and calling a plotting tool (i.e. seaborn.heatmap). If the signature of plt.figure() included an optional argument for specifying the required imports and call to the plotting function, pylustrator could simply write this to the output file. See latest commit for example of this in complex_example.py.

rgerum commented 3 years ago

hmm your example looks quite a bit bulky as the user has to essentially write their code twice. But maybe the question here is a bit what the use case actually is for which to optimize.

I thought it might be nice to have a way to "serialize" a whole matplotlib plot. To be used either from pylustrator interface (e.g. save to different file) or just as a function call: pylustrator.export("new_script.py"). Which could be also quite interesting if you just want to have a simple script file to reproduce your figure without the preprocessing the user might persumably might have in their original script.

JacksonBurns commented 3 years ago

I can refactor this into a new method to export the changes script to a Python file, sure. That sounds quite helpful. Will open a new PR, though, to keep things separated.

The use cases I am envisioning is fully reproducible plots where the source code that generates the data is (a) too slow to re-run constantly or (b) 'hidden' in a package, like in my case. For both of these cases, I agree that bulk is bad but I think some bulk might be ok. Because the code is either (a) only going to be run once or (b) going to be written by someone else and 'hidden' the implementation specifics shouldn't be too much of a pain. Two ideas:

  1. To serialize the entire interface, could we just pickle the interface + figure and load it from there? I have not done this before so pardon a potentially naive question.
  2. Instead of requiring the user to copy their code, we can use decorators or pragmas. This would look something like this:
    import numpy as np; np.random.seed(0)
    uniform_data = np.random.rand(10, 12)
    import pylustrator
    pylustrator.start()
    plt.figure(
    output_file="thisisatest.py"
    )
    @pylustrator_start __or__ # pragma: pylustrator start
    import seaborn as sns; sns.set_theme()
    import matplotlib.pyplot as plt
    ax = sns.heatmap(uniform_data)
    @pylustrator_end __ or__ # pragma: pylustrator end
    plt.show()

    I think this second approach could be quite useful.

rgerum commented 3 years ago

So what I have done for user interfaces that should export plotting code, I wrapped the plot script inside a function. As python introspection can return the code of a function, for your usecase this should be the best solution, as it would allow you to even export the code with comments. And I think that might be cleaner than adding these pragma comments.

import numpy as np; np.random.seed(0)
uniform_data = np.random.rand(10, 12)
import pylustrator
pylustrator.start()

def do_plot(uniform_data):
    import seaborn as sns; sns.set_theme()
    import matplotlib.pyplot as plt
    ax = sns.heatmap(uniform_data)

plt.figure(
    output_file="thisisatest.py",
    code_function=[do_plot, uniform_data],
)
do_plot(uniform_data)
plt.show()
rgerum commented 3 years ago

But I think in general these are two slightly different use cases:

  1. you want to have the user export some plot from your user infertece/package. This code could then pre-prepared by the package author and maybe nicely formatted with comments etc.
  2. you just want to be able to dump the output of an arbitrary script that generated a matplotlib plot into a script that creates this plot.
rgerum commented 3 years ago

a code creation of a plot function could look like this (I have used a similar function once):

def value_create(key, value):
    import numpy as np
    import pandas as pd
    if isinstance(value, str):
        return f"{key} = \"{value}\"\n"
    elif isinstance(value, np.ndarray):
        return f"import numpy as np\n{key} = np.{repr(value)}\n"
    elif isinstance(value, pd.DataFrame):
        return f"import pandas as pd\nimport io\n{key} = pd.read_csv(io.StringIO('''{value.to_csv()}'''))\n"
    return f"{key} = {repr(value)}\n"

def execute(func, *args, **kwargs):
    func(*args, **kwargs)
    import inspect
    code_lines = inspect.getsource(func).split("\n")[1:]
    indent = len(code_lines[0]) - len(code_lines[0].lstrip())
    code = "\n".join(line[indent:] for line in code_lines)
    for key, value in kwargs.items():
        code = value_create(key, value) + code
    return code

def plot(uniform_data, data_frame, color):
    import seaborn as sns
    sns.set_theme()
    import matplotlib.pyplot as plt
    plt.subplot(121)
    ax = sns.heatmap(uniform_data)
    plt.subplot(122)
    plt.plot(data_frame["a"], data_frame["b"], color=color)

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
uniform_data = np.random.rand(10, 12)
data_frame = pd.DataFrame([[1,2],[3,4]], columns=["a", "b"])
code = execute(plot, uniform_data=uniform_data, data_frame=data_frame, color="red")
print(code)
plt.show()
JacksonBurns commented 3 years ago

I will get this going into a working example asap

JacksonBurns commented 3 years ago

Hi @rgerum!

I have completed a working version of the standalone file writing. Run temp.py and take a look when you can -- it should generate sample_pylustrator_output.py from scratch, which you can then run on its own to recreate the plot.

The way this works is that if output_file is given a name and reqd_code is provided, all those objects will be included in the new output and it will be a standalone, runnable file. If reqd_code is not provided but output_file is still specified, the output to the file will only contain the change code written by pylustrator, i.e. the first part of my earlier comment.

This implementation also assumes that the reqd_code is provided as: [function, argument_1, argument_2, ...]

Sorry for the massive diffs now. I had my auto-formatter running, and it changed a bunch of docstrings, indentation styles etc. I do think it would be a good idea to introduce some more uniform formatting across the repo, though. Black is my personal favorite.

JacksonBurns commented 3 years ago

Hi @rgerum just checking in on this again -- how does the PR look?