Open JacksonBurns opened 3 years ago
hmm, this seems interesting to be able to save the whole plot to a new and clean python file. Here it might even be interesting to then create a minimal python file that tries to condense all descriptions. So there e.g. the line width could be edited directly into the plt.plot function and does not need to be in a separate pylustaror generated code part, as all is pylustrator generated. But the problem here is, I think, that it would need to support all or at least a great deal of the matplotlib features. And I think some like plt.errorbar does not create matplotlib artists which still know about the data. So pylustrator would need here to inject some tracing code into the plt.errorbar function.
So there e.g. the line width could be edited directly into the plt.plot function and does not need to be in a separate pylustaror generated code part, as all is pylustrator generated.
I actually think the easiest way forward would be to leave all plot customization in the pylustrator -- it already does an excellent job tracking plot changes, so I don't think we need to reinvent the wheel. The ideal workflow would look like this:
plt.plot(X,y)
.pylustrator
window they make all changes of interest.But the problem here is, I think, that it would need to support all or at least a great deal of the matplotlib features.
Agreed that this would be difficult. If the above method were to work, though, all we would need to do is get the "data" our of all the conceivable plot types. For this simple example, it is pretty much fully functional. For examples like you mentioned with error bars though, I am not sure how to go about it since they don't seem to leave any easily accessible record of the data plotted.
Alternate approach -- with my proposed layout, the user is presumably importing a single library (like seaborn, etc.) and calling a plotting tool (i.e. seaborn.heatmap). If the signature of plt.figure()
included an optional argument for specifying the required imports and call to the plotting function, pylustrator
could simply write this to the output file. See latest commit for example of this in complex_example.py
.
hmm your example looks quite a bit bulky as the user has to essentially write their code twice. But maybe the question here is a bit what the use case actually is for which to optimize.
I thought it might be nice to have a way to "serialize" a whole matplotlib plot. To be used either from pylustrator interface (e.g. save to different file) or just as a function call: pylustrator.export("new_script.py"). Which could be also quite interesting if you just want to have a simple script file to reproduce your figure without the preprocessing the user might persumably might have in their original script.
I can refactor this into a new method to export the changes script to a Python file, sure. That sounds quite helpful. Will open a new PR, though, to keep things separated.
The use cases I am envisioning is fully reproducible plots where the source code that generates the data is (a) too slow to re-run constantly or (b) 'hidden' in a package, like in my case. For both of these cases, I agree that bulk is bad but I think some bulk might be ok. Because the code is either (a) only going to be run once or (b) going to be written by someone else and 'hidden' the implementation specifics shouldn't be too much of a pain. Two ideas:
import numpy as np; np.random.seed(0)
uniform_data = np.random.rand(10, 12)
import pylustrator
pylustrator.start()
plt.figure(
output_file="thisisatest.py"
)
@pylustrator_start __or__ # pragma: pylustrator start
import seaborn as sns; sns.set_theme()
import matplotlib.pyplot as plt
ax = sns.heatmap(uniform_data)
@pylustrator_end __ or__ # pragma: pylustrator end
plt.show()
I think this second approach could be quite useful.
So what I have done for user interfaces that should export plotting code, I wrapped the plot script inside a function. As python introspection can return the code of a function, for your usecase this should be the best solution, as it would allow you to even export the code with comments. And I think that might be cleaner than adding these pragma comments.
import numpy as np; np.random.seed(0)
uniform_data = np.random.rand(10, 12)
import pylustrator
pylustrator.start()
def do_plot(uniform_data):
import seaborn as sns; sns.set_theme()
import matplotlib.pyplot as plt
ax = sns.heatmap(uniform_data)
plt.figure(
output_file="thisisatest.py",
code_function=[do_plot, uniform_data],
)
do_plot(uniform_data)
plt.show()
But I think in general these are two slightly different use cases:
a code creation of a plot function could look like this (I have used a similar function once):
def value_create(key, value):
import numpy as np
import pandas as pd
if isinstance(value, str):
return f"{key} = \"{value}\"\n"
elif isinstance(value, np.ndarray):
return f"import numpy as np\n{key} = np.{repr(value)}\n"
elif isinstance(value, pd.DataFrame):
return f"import pandas as pd\nimport io\n{key} = pd.read_csv(io.StringIO('''{value.to_csv()}'''))\n"
return f"{key} = {repr(value)}\n"
def execute(func, *args, **kwargs):
func(*args, **kwargs)
import inspect
code_lines = inspect.getsource(func).split("\n")[1:]
indent = len(code_lines[0]) - len(code_lines[0].lstrip())
code = "\n".join(line[indent:] for line in code_lines)
for key, value in kwargs.items():
code = value_create(key, value) + code
return code
def plot(uniform_data, data_frame, color):
import seaborn as sns
sns.set_theme()
import matplotlib.pyplot as plt
plt.subplot(121)
ax = sns.heatmap(uniform_data)
plt.subplot(122)
plt.plot(data_frame["a"], data_frame["b"], color=color)
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
uniform_data = np.random.rand(10, 12)
data_frame = pd.DataFrame([[1,2],[3,4]], columns=["a", "b"])
code = execute(plot, uniform_data=uniform_data, data_frame=data_frame, color="red")
print(code)
plt.show()
I will get this going into a working example asap
Hi @rgerum!
I have completed a working version of the standalone file writing. Run temp.py
and take a look when you can -- it should generate sample_pylustrator_output.py
from scratch, which you can then run on its own to recreate the plot.
The way this works is that if output_file
is given a name and reqd_code
is provided, all those objects will be included in the new output and it will be a standalone, runnable file. If reqd_code
is not provided but output_file
is still specified, the output to the file will only contain the change code written by pylustrator
, i.e. the first part of my earlier comment.
This implementation also assumes that the reqd_code
is provided as: [function, argument_1, argument_2, ...]
Sorry for the massive diffs now. I had my auto-formatter running, and it changed a bunch of docstrings, indentation styles etc. I do think it would be a good idea to introduce some more uniform formatting across the repo, though. Black is my personal favorite.
Hi @rgerum just checking in on this again -- how does the PR look?
Hello, and thanks for the great tool!
I am working on an package which generates plots for the user, and as implemented,
pylustrator
would be editing the source code of my package whensave...
is used. This won't be useful when this is distributed on PyPi, as the edits will be buried in an environment folder and not runnable due to the structure of the package.This PR contains a proof of concept for having
pylustrator
create a separate and standalone file which can be run to regenerate the plot separate from the source code.Run
temp.py
, clicksave...
in thepylustrator
UI, and then you should be able to runthisisatest.py
to regenerate the plot.Thoughts on building up this implementation or continuing at all? I have some doubts as to whether or not this would work on plots which are not line plots, but this might still be workable.