has2k1 / plotnine

A Grammar of Graphics for Python
https://plotnine.org
MIT License
3.97k stars 213 forks source link

Rasterized colorbars in PDFs are not recognized by Adobe Illustrator #422

Closed gokceneraslan closed 2 years ago

gokceneraslan commented 4 years ago

When I try to edit a PDF generated by plotnine in Illustrator, I get an An unknown imaging construct was encountered. warning if there is a continuous color scale in the plot (e.g. gradient). This also makes the color legend uneditable. I was wondering why this might be happening and so I made some experiments with matplotlib and also ggplot2 in R.

This is a simple example to generate such a plot:

from plotnine import *
import numpy as np

x = np.linspace(0, 1, 10)
g = qplot(x, x, color=x, geom='point')
ggsave(g, 'ggplot-python.pdf')

Here is how it looks like in Illustrator, big X sign indicates the "unknown imaging construct" (for some reason it covers the entire image but it is actually only due to the color legend):

image

When I move the legend around:

image

However a simple gradient produced directly by matplotlib renders fine and is editable:

f, ax = plt.subplots(figsize=(4,0.5))
gradient = np.linspace(0, 1, 256)
gradient = np.vstack((gradient, gradient))
ax.imshow(gradient, aspect='auto', cmap=plt.get_cmap('viridis'))
f.savefig('matplotlib.pdf')

image

Last, I tried the R equivalent of the first code:

library(ggplot2)

x = seq(0, 1, by=0.1)
g = qplot(x, x, color=x)

ggsave('ggplot-R.pdf')

Color bar in this PDF looks perfectly fine and there is no warning at all:

image

I know that asking for a fix that affects only (and is reproducible only with) Illustrator which is a proprietary software, is not very meaningful, given that plotnine is a great free software. But I just wanted to ask if there is anything weird about the PDFs with gradients produced by plotnine, or can we make them more "R-like" so that they are easier to edit.

Cheers.

gokceneraslan commented 4 years ago

Apparently this is about the rasterization of the colorbar. But I'm still confused.

ggplot(aes(x='x', y='x', color='x'), data=df) + geom_point() + guides(color=guide_colorbar(raster=False, nbin=100))

This plot looks fine and the colorbar is editable, it's bunch of rectangles after all. However I still do not understand why rasterized colorbar is not recognized by Illustrator. Interestingly, R also rasterizes the colorbar by default but it looks totally fine in Illustrator.

@has2k1 Could this be about the shading in QuadMesh? I'm sure you looked into it before.

gokceneraslan commented 4 years ago
import matplotlib.pyplot as plt;
import numpy as np;

x1 = np.random.randn(100);
x2 = np.random.randn(100);
x3 = np.random.randn(100, 100);

fig, ax = plt.subplots();

quadMeshCol = ax.pcolormesh(x1, x2, x3, shading='gouraud');

plt.show();

gives the same error whereas

import matplotlib.pyplot as plt;
import numpy as np;

x1 = np.random.randn(100);
x2 = np.random.randn(100);
x3 = np.random.randn(100, 100);

fig, ax = plt.subplots();

quadMeshCol = ax.pcolormesh(x1, x2, x3, shading='flat');

plt.show();

is perfectly fine, therefore it seems like shading is the problem.

gokceneraslan commented 4 years ago

I was wondering if we can simply pass rasterized=True to the PolyCollection, so I made the following modification:

image

But then the colorbar is misplaced in the plot:

image

gokceneraslan commented 4 years ago

I also wonder if there are things to borrow from mpl's colorbar, which renders fine, is editable and not rasterized at all.

image

has2k1 commented 4 years ago

Thanks for trying to get to the bottom of this. This issue is like bad xml/html markup where some tag/attribute is badly encapsulated. I think I will wait and see want comes from the matplotlib bug report.

gokceneraslan commented 4 years ago

Thanks @has2k1 . Do you remember why you do not simply rasterize the PolyCollection and use QuadMesh without rasterized=True (so actually without any rasterization) instead?

has2k1 commented 4 years ago

I recall to have looked at how the colorbar was implemented in Matplotlib, it was quite involved but I noticed that Quadmesh was used and so I built a solution using Quadmesh. I do not recall even thinking about PolyCollection!