jheinen / GR.jl

Plotting for Julia based on GR, a framework for visualisation applications
Other
354 stars 76 forks source link

nonuniformcellarray very slow for pdf output #275

Open yha opened 4 years ago

yha commented 4 years ago

Saving a heatmap to pdf through Plots takes several seconds since JuliaPlots/Plots.jl#2234. Most of the time is spent in nonuniformcellarray. Example:

using Plots
heatmap(rand(10,10))
@time savefig("tmp.png") # fast
@time savefig("tmp.pdf") # slow
heliosdrm commented 4 years ago

With GR both outputs are fast, and PDF even faster:

julia> using GR

julia> heatmap(randn(10,10))

julia> @time savefig("tmp.png") # first time, before compilation
  1.558997 seconds (499.30 k allocations: 24.844 MiB, 0.56% gc time)

julia> @time savefig("tmp.png") # second time
  1.210075 seconds (1.01 k allocations: 100.547 KiB)

julia> @time savefig("tmp.pdf") # now in pdf
  0.257403 seconds (1.01 k allocations: 100.547 KiB)

Maybe this should reported as issue to Plots, rather than to GR?

yha commented 4 years ago

The only problem with Plots seems to be that it started calling GR.nonuniformcellarray. A pure GR example:

using GR
heatmap( 0:10, 0:10, randn(10,10) ) # Specifying x,y to trigger non-uniform path

@time savefig("tmp.png")  # 0.802986 seconds (1.21 k allocations: 94.266 KiB)
@time savefig("tmp.pdf")  # 5.876382 seconds (1.21 k allocations: 94.266 KiB)

Again, profiling shows almost all the time is spent in GR.nonuniformcellarray (for the pdf case).

jheinen commented 4 years ago

GR.nonuniformcellarray() is "device-independent", so the only explanation (for me) is, that the input arguments vary significantly. We will have to check the length of the vectors provides by heatmap_edges() in the Plots part.

jheinen commented 4 years ago

The point is that Plots always calls GR.nonuniformheatmap() even if the data is on a uniform grid. Internally the heatmap is then treated as a 2000 x 2000 image array, which is more complex (regarding the I/O). For uniform heatmaps, GR.drawimage() should be used.

yha commented 4 years ago

GR.nonuniformcellarray() is "device-independent", so the only explanation (for me) is, that the input arguments vary significantly.

When used through the Plots interface, the input arguments to GR.nonuniformcellarray() are identical in the png and pdf case (which I verified by adding @show x y color as the first line of GR.nonuniformcellarray), yet somehow GR.nonuniformcellarray() is about 70 times slower for pdf:

import Plots

Plots.heatmap(0:2,0:2,randn(2,2))

@time Plots.savefig("tmp.png")  # 0.082064 seconds (598.82 k allocations: 11.824 MiB)
@time Plots.savefig("tmp.pdf")  # 6.010560 seconds (598.82 k allocations: 35.068 MiB, 0.18% gc time)
@profiler Plots.savefig("tmp.pdf")  # Almost all the time in nonuniformcellarray
heliosdrm commented 4 years ago

I'd say it has to do with the size of the files. Look at the generated files: a PDF generated of a non-uniform heatmap takes always about 24 MB no matter how "big" is the heatmap, because as Josef explained it is always treated as a 2000 x 2000 image array. On the other hand, PDFs generated from uniform heatmaps (with GR, no Plots) are proportional to the actual size of the heatmap (around 1 Kb for every 10 cells).

If PNGs do not suffer of oversized heatmaps may be because they are compressed files, while PDFs by GR are text files without compression (you can open them with a text editor).

I think that the issue should be reported on Plots - which should not call nonuniformcellarray if the heatmap is not really uniform.

yha commented 4 years ago

I'd say it has to do with the size of the files. Look at the generated files: a PDF generated of a non-uniform heatmap takes always about 24 MB no matter how "big" is the heatmap, because as Josef explained it is always treated as a 2000 x 2000 image array.

I see, that does make sense now.

I think that the issue should be reported on Plots - which should not call nonuniformcellarray if the heatmap is not really uniform.

I would say it's still an issue with GR if nonuniformcellarray is slow and creates very large files. Anyway, I will submit a PR for the Plots GR backend to avoid non-uniform heatmaps when possible.

heliosdrm commented 4 years ago

I would say it's still an issue with GR if nonuniformcellarray is slow and creates very large files.

Well, that issue is beyond the Julia implementation of GR. I think that the best place to report it is the original GR framework, since the only thing that GR.nonuniformcellarray does is calling the C function gr_nonuniformcellarray.

jheinen commented 4 years ago

A solution to this problem only makes sense in the Plots/gr backend. I take a look at that.

It actually only needs a check in Plots if the heatmap is "uniform". If so, GR.drawimage() should be used (like before).

yha commented 4 years ago

Working on a PR for Plots I noticed that GR.drawimage() with log axes saves quickly to a small pdf although it (correctly) renders non-uniform rectangles. Can whatever method is used in this case not also be used for nonuniformcellarray?

jheinen commented 4 years ago

I made a PR which hopefully fixes the problem.