jfree / jfreesvg

A fast, lightweight Java library for creating Scalable Vector Graphics (SVG) output.
http://www.jfree.org/jfreesvg
GNU General Public License v3.0
318 stars 58 forks source link

Question: How can I reduce the "size" of the generated SVG? #26

Open davidvarga opened 4 years ago

davidvarga commented 4 years ago

Hi,

I am using JFreeSVG in combination with JFreeChart to produce SVG images to be embedded later on in HTML.

My problem is that the generated SVG file is huge (as example: plotting a single measurement where the sampling rate is 1ms, the time (X) is 270 seconds long (270k data points) produces an SVG file of 180MB. The picture is perfect but it simpy kills any browser I try to open with.

Question: How can I reduce (drastically) the generated SVG size?

The code generating the SVG (Kotlin):

val chart = createChart(sigData) ?: return@action
val g2 = SVGGraphics2D(600, 400).apply {
    shapeRendering = "optimizeSpeed"
}
chart.draw(g2,  Rectangle(0, 0, 600, 400))
SVGUtils.writeToSVG(calculateExportFile(), g2.svgElement)

I also tried using SVGHints with no luck.

Thanks a lot, David

mhschmieder commented 4 years ago

At least we know the character encoding isn't the reason, as that function writes to UTF-8 vs. UTF-16.

Do you pre-filter the chart's data? In one of my applications, I have a collection of overlapping IFFT's (at different impulse measurement distances from the point source) with up to 30 million overall data points, but the relevant time window for any given measurement is usually only a small portion of that, so I choose some heuristics for a reasonable epsilon and throw out the "noise" so that there are fewer overall line segments in the rendered data.

davidvarga commented 4 years ago

No, I don't do any prefiltering.

I could try to decimate the data. Let's say take every second data point as first try then tune it to know how many data points are producing an SVG with an acceptable size without losing visual data on the picture. Unfortunately I already tried to normalize the data by removing every redundant data points (same data on the next time-stamp) but the raw data is really changing every millisecond.

I make some filtering experiment and then I'll comment again.

mhschmieder commented 4 years ago

Well, best of luck, and I look forward to seeing what you find out. I can see how you wouldn't have the luxury of pre-filtering the data even with a reasonable epsilon for redundancies or noise. In my own context, the measurements are domain-specific and thus have known bounds and dynamic range of possible values. That makes a big difference.

davidvarga commented 4 years ago

So, I tried it, a decimation of 27 (resulting in 10k data points) is producing a small-enough image, but now the data shown is different as I'm really decimating and not rescaling/resampling. This would be the first case if I have to imlement an algorithm to rescale (or find a library) and the export to image. The problem still remains if there are multiple signals plotted, the number of plotted data points can grow in a linear fashion, where I have no control.

It would be much much bettern if there would be some setting in the library to "compress" the created image.

mhschmieder commented 4 years ago

I may have dealt with this issue previously. I'm in the midst of pulling archived code to add to my BasicGraphics2D library that I am preparing as a general purpose utilities library so that it isn't part of my EpsGraphics2D library when it doesn't need to be. I was using UC Berkeley's PtPlot for years before switching to JavaFX, and may have had to do data rescaling there as well -- I definitely have to do it in FX Charts due to the Scene Graph having significant limits on Node count. I may find this older code a bit later tonight; at the very least, I think I already had it in place for the AWT context when exporting to Excel (CSV at the time; I now use Apache POI for XLSX export). There's no guarantee the archive has copies of the older code; we'll see.

mhschmieder commented 3 years ago

I found the AWT version of my resampling algorithm just now, and am copying it verbatim below, hoping that the code, variable and function and class names, and comments, are clear enough in isolation from the overall library context, to be understandable and potentially usable with JFreeChart or whatever.

You may not have enough discardable noise in your data; in my prior case the data domain was fixed, specific, and known in its bounds. Be aware that this particular IFFT is normalized to { -1, +1 } as the actual units aren't terribly useful in the context I was addressing. Finding the Peak Impulse Time was the more critical factor. But even so, one of the techniques in this algorithm may help you, if you're not already doing it, and that is to combine each data set plot into a single Polyline. This especially made a big difference in producing EPS output, so likely would have a similar impact on your SVG size.

/**
 * Returns a double-precision polyline represented as a {@link GeneralPath},
 * compensating for this being a missing geometric primitive in AWT. Returns
 * {@code null} if the input arguments are not consistent, as AWT does not
 * provide a mechanism for determining an empty Path.
 * <p>
 * This method provides a {@link GeneralPath} implementation of polylines
 * using double-precision floating-point, for efficient rendering of large
 * collections of contiguous line segments.
 * <p>
 * The xPoints and yPoints arrays must be the same effective size, as
 * indicated by the supplied number of points. It is OK for either array to
 * be larger, but neither can be smaller than this size.
 *
 * @param xPoints
 *            A double-precision array of x coordinates for the polyline
 * @param yPoints
 *            A double-precision array of x coordinates for the polyline
 * @param numberOfPoints
 *            The number of double-precision points to make for this
 *            polyline
 * @return A {@link GeneralPath} that represents a double-precision
 *         polyline, or {@code null} to indicate that no valid polyline
 *         could be created due to inconsistent input arguments
 *
 * @since 1.0
 */
public static GeneralPath makePolyline( final double[] xPoints,
                                        final double[] yPoints,
                                        final int numberOfPoints ) {
    // If no points are requested, or either coordinate array is smaller
    // than the expected number of points, return the empty path.
    if ( ( numberOfPoints <= 0 ) || ( xPoints.length < numberOfPoints )
            || ( yPoints.length < numberOfPoints ) ) {
        return null;
    }

    final GeneralPath path = new GeneralPath();
    path.moveTo( xPoints[ 0 ], yPoints[ 0 ] );

    for ( int i = 1; i < numberOfPoints; i++ ) {
        path.lineTo( xPoints[ i ], yPoints[ i ] );
    }

    return path;
}

Of course I check data vector size first, and do three versions for int, float, and double. AWT doesn't have a Polyline shape, whereas JavaFX does.

For purposes of legibility, I am attaching the data reduction code snippet in a separate post.

I have now refactored it into a reusable utility function, stripped of specifics to the class I implemented it on (e.g. pass parameters for tolerance and dynamic range vs. built-in assumptions of the data domain).

I also eliminated dependencies on EpsGraphics2D and have now tested this code with both OrsonPDF and jFreeSVG, as I abstracted my EPS support to cover all three formats generically.

If your context is a scene graph with data points as nodes, rather than a canvas with graphics primitives drawn in the repaint loop, I can also copy/paste my JavaFX data reduction solution, which has some differences with this AWT algorithm.

mhschmieder commented 3 years ago
/**
 * This method transforms a set of data points to screen coordinates for
 * purposes of rendering or export.
 * <p>
 * The goal here is to write efficient code that avoids auto-boxing and
 * unboxing, class instancing, etc. Thus we bypass some nice programming
 * paradigms, as the data vectors could have millions of points. This also
 * means that the target arrays have to be pre-constructed by the invoker,
 * and should match the original coordinate arrays in size. This is somewhat
 * wasteful, but less so than other approaches (until a better way is
 * found). The actual number of transformed data points is returned so that
 * client code can then make more efficient storage structures downstream.
 *
 * @param xCoordinates
 *            The original x-coordinates in domain/model space
 * @param yCoordinates
 *            The original y-coordinates in domain/model space
 * @param numberOfCoordinates
 *            The number of original coordinates
 * @param xMin
 *            The minimum x-axis value for the data window
 * @param yMin
 *            The minimum y-axis value for data normalization
 * @param xMax
 *            The maximum x-axis value for the data window
 * @param yMax
 *            The maximum y-axis value for data normalization
 * @param xScale
 *            The x-axis scale factor to apply from domain/model space to
 *            screen space
 * @param yScale
 *            The y-axis scale factor to apply from domain/model space to
 *            screen space
 * @param ulx
 *            The x-coordinate of the upper-left corner of the chart, in
 *            screen space
 * @param lry
 *            The y-coordinate of the lower-right corner of the chart, in
 *            screen space
 * @param applyDataReduction
 *            Flag for whether to apply data reduction techniques
 * @param dataVarianceFactor
 *            The amount of variance between neighboring data values
 *            (y-axis) to use for determining redundancy during data
 *            reduction
 * @param xCoordinatesTransformed
 *            The (potentially data-reduced) transformed x-coordinates in
 *            screen space
 * @param yCoordinatesTransformed
 *            The (potentially data-reduced) transformed y-coordinates in
 *            screen space
 * @return The number of data points transformed to screen coordinates
 */
public static int transformDataVectorToScreenCoordinates( final double[] xCoordinates,
                                                          final double[] yCoordinates,
                                                          final int numberOfCoordinates,
                                                          final double xMin,
                                                          final double yMin,
                                                          final double xMax,
                                                          final double yMax,
                                                          final double xScale,
                                                          final double yScale,
                                                          final double ulx,
                                                          final double lry,
                                                          final boolean applyDataReduction,
                                                          final double dataVarianceFactor,
                                                          final double[] xCoordinatesTransformed,
                                                          final double[] yCoordinatesTransformed ) {
    // Cache the first in-range data point, so we can do a move followed by
    // any number of contiguous line segments.
    double prevX = 0d;
    double prevY = 0d;
    int transformedCoordinateIndex = -1;
    int firstDataPointIndex = 0;
    final int finalCoordinateIndex = numberOfCoordinates - 1;
    for ( int i = firstDataPointIndex; i <= finalCoordinateIndex; i++ ) {
        final double xValue = xCoordinates[ i ];
        if ( xValue >= xMin ) {
            prevX = ulx + ( ( xValue - xMin ) * xScale );
            prevY = lry - ( ( yCoordinates[ i ] - yMin ) * yScale );

            transformedCoordinateIndex++;
            xCoordinatesTransformed[ transformedCoordinateIndex ] = prevX;
            yCoordinatesTransformed[ transformedCoordinateIndex ] = prevY;

            firstDataPointIndex = i;

            break;
        }
    }

    // Loop through all of the in-range points in the current data set.
    boolean dataValueInvariant = false;
    for ( int i = firstDataPointIndex + 1; i <= finalCoordinateIndex; i++ ) {
        final double xValue = xCoordinates[ i ];
        if ( xValue > xMax ) {
            // If we are past the valid data range, ensure that there are at
            // least two retained data points so that we don't end up with a
            // blank trace. In this case, it means grabbing the previous.
            if ( transformedCoordinateIndex < 1 ) {
                final double xPosPrev = ulx + ( ( xCoordinates[ i - 1 ] - xMin ) * xScale );
                final double yPosPrev = lry - ( ( yCoordinates[ i - 1 ] - yMin ) * yScale );

                transformedCoordinateIndex++;
                xCoordinatesTransformed[ transformedCoordinateIndex ] = xPosPrev;
                yCoordinatesTransformed[ transformedCoordinateIndex ] = yPosPrev;
            }

            break;
        }

        final double xPos = ulx + ( ( xValue - xMin ) * xScale );
        final double yPos = lry - ( ( yCoordinates[ i ] - yMin ) * yScale );

        if ( i == finalCoordinateIndex ) {
            // Once we get to the final valid data point and it is in-range,
            // automatically include it so that we don't have a truncated
            // trace or a strange trajectory to the top or bottom of chart.
            transformedCoordinateIndex++;
            xCoordinatesTransformed[ transformedCoordinateIndex ] = xPos;
            yCoordinatesTransformed[ transformedCoordinateIndex ] = yPos;

            break;
        }

        // If the magnitude is the same as before, even zooming won't show
        // more resolution, so it is wasteful to plot this point.
        //
        // Due to the pixel values being coarse integer values, we can get
        // noise in double precision floating-point math, so we apply a data
        // variance factor to the comparison as otherwise an all-zero data
        // vector (or flat data range subset) can have misleading variance.
        if ( applyDataReduction && ( yPos >= ( prevY - dataVarianceFactor ) )
                && ( yPos <= ( prevY + dataVarianceFactor ) ) ) {
            dataValueInvariant = true;
            continue;
        }
        else if ( dataValueInvariant ) {
            // Once we get to variant data, ensure that there is at least
            // one flat line for the invariant data or else we slope towards
            // the first changed data value.
            final double xPosPrev = ulx + ( ( xCoordinates[ i - 1 ] - xMin ) * xScale );
            final double yPosPrev = lry - ( ( yCoordinates[ i - 1 ] - yMin ) * yScale );

            transformedCoordinateIndex++;
            xCoordinatesTransformed[ transformedCoordinateIndex ] = xPosPrev;
            yCoordinatesTransformed[ transformedCoordinateIndex ] = yPosPrev;

            dataValueInvariant = false;
        }

        // Cache the scaled current point as a unique data point.
        transformedCoordinateIndex++;
        xCoordinatesTransformed[ transformedCoordinateIndex ] = xPos;
        yCoordinatesTransformed[ transformedCoordinateIndex ] = yPos;

        // Save the current point as the previous point for the next line.
        prevX = xPos;
        prevY = yPos;
    }

    // As data reduction may have been applied, we cannot use the allocated
    // array size to determine the number of transformed coordinates in
    // screen space, and must instead convert the last array index to a
    // size/count/length.
    final int numberOfCoordinatesTransformed = transformedCoordinateIndex + 1;

    return numberOfCoordinatesTransformed;
}
mhschmieder commented 3 years ago

Here is a typical usage context within a chart panel that derived ultimately from a JPanel, though the critical factor is just having an available Graphics2D canvas at hand.

/**
 * This method draws the supplied data vector on the screen, within the
 * established chart bounds, after mapping from model/domain space to screen
 * space and potentially applying data reduction techniques.
 *
 * @param graphicsContext
 *            The {@link Graphics2D} Graphics Context to use for drawing the
 *            resulting lines between the data points
 * @param xCoordinates
 *            The original x-coordinates in domain/model space
 * @param yCoordinates
 *            The original y-coordinates in domain/model space
 * @param numberOfCoordinates
 *            The number of original coordinates
 */
protected final void drawDataVector( final Graphics2D graphicsContext,
                                     final double[] xCoordinates,
                                     final double[] yCoordinates,
                                     final int numberOfCoordinates ) {
    // Make copies of the coordinate arrays, for transformed values.
    final double[] xCoordinatesTransformed = new double[ numberOfCoordinates ];
    final double[] yCoordinatesTransformed = new double[ numberOfCoordinates ];

    // Transform the data points to screen space, using data reduction with
    // a data variance factor that accounts for AWT pixel values being
    // effectively integer-based.
    final int numberOfCoordinatesTransformed = ChartUtilities
            .transformDataVectorToScreenCoordinates( xCoordinates,
                                                     yCoordinates,
                                                     numberOfCoordinates,
                                                     _xMin,
                                                     _yMin,
                                                     _xMax,
                                                     _yMax,
                                                     _xScale,
                                                     _yScale,
                                                     _ulx,
                                                     _lry,
                                                     true,
                                                     0.001d,
                                                     xCoordinatesTransformed,
                                                     yCoordinatesTransformed );

    // Fortunately, we can be efficient by combining the entire trace into a
    // single open polyline, having stripped any redundant data points. This
    // improves AWT performance, and reduces file size for Graphics Exports.
    if ( isVectorizationActive() ) {
        // For vectorization to various vector graphics file formats, we can
        // invoke the double-precision polygon method. This greatly
        // reduces the number of PostScript moveto's and newpath's (and
        // their equivalents) in the format-specific overrides of the draw()
        // method from each format's specialized version of Graphics2D.
        final GeneralPath path =
                               GeometryUtilities.makePolyline( xCoordinatesTransformed,
                                                               yCoordinatesTransformed,
                                                               numberOfCoordinatesTransformed );
        graphicsContext.draw( path );
    }
    else {
        // For AWT, we have to first round all coordinates to integers.
        final int[] xCoordinatesRounded = new int[ numberOfCoordinatesTransformed ];
        final int[] yCoordinatesRounded = new int[ numberOfCoordinatesTransformed ];
        for ( int i = 0; i < numberOfCoordinatesTransformed; i++ ) {
            xCoordinatesRounded[ i ] = ( int ) Math.round( xCoordinatesTransformed[ i ] );
            yCoordinatesRounded[ i ] = ( int ) Math.round( yCoordinatesTransformed[ i ] );
        }
        final GeneralPath path =
                               GeometryUtilities.makePolyline( xCoordinatesRounded,
                                                               yCoordinatesRounded,
                                                               numberOfCoordinatesTransformed );
        graphicsContext.draw( path );
    }
}
davidvarga commented 3 years ago

@mhschmieder : Thanks a lot for the effort! I really appreciate it!

I will give it a try soon - I'm just working on some proof-of-concept thing which has a higher prio for the management at the moment. I will give you feedback soon.

Thanks again!

mhschmieder commented 3 years ago

Cool; no problem. I just finalized my final code review and API documentation revisions for five open source libraries that hopefully I will get published tomorrow (finally), one of which contains a lot of utilities that are useful for different vector graphics export targets, such as the data reduction algorithm above.

Although it's a super-skeletal library at the moment (and I haven't yet had time to learn how to publish artifacts and Javadocs freely to Maven Central), I did make a placeholder project for it on my GitHub page, after restructuring the code a bit more to be as generic as possible (the original code was specific to the needs of a particular application and problem domain).

https://github.com/mhschmieder/charttoolkit

My other GitHub-posted toolkits are graphics oriented as well, and are growing rather quickly, beyond just supporting my just-published EPS vector graphics export library.

Even though I got mostly away from AWT/Swing (after two painful years of dealing with multi-threading in hybrid code) towards JavaFX, I'm still seeing more Swing in the job market (although very little Java front-end desktop work at all anymore), so I'm trying to package up all of my legacy code into more generic nuggets that might help others, as that also gets me back into the "swing" of things with those toolkits (which I only retained for a few legacy needs, these past few years).