timbod7 / haskell-chart

A 2D charting library for haskell
425 stars 86 forks source link

diagrams back end is very slow #105

Open bscarlet opened 8 years ago

bscarlet commented 8 years ago

Rendering the example chart from the docs for Graphics.Rendering.Chart.Easy with cairo and diagrams respectively, the diagrams version is very slow - more than 8 seconds (on my oldish hardware). The cairo version seems reasonably fast (less than 0.2 seconds). Do you have any idea what would make the diagrams backend so slow?

bergey commented 8 years ago

Last time I looked, the biggest contribution was loading several font files with SVGFonts.

timbod7 commented 8 years ago

Can you post the actual code you are using? @bergey is correct that the SVG font processing is slow and some changes were made in the last 12 months to allow loading few fonts at startup.

bollu commented 8 years ago

I'd like to look into this, could someone point a newbie in the right direction?

bergey commented 8 years ago

@bollu My basic process is:

I go back and forth on whether to compile everything with profiling or just the libraries of interest. At least you'll want diagrams-core, diagrams-lib, diagrams-svg, Chart, Chart-diagrams, SVGFonts.

With a bit more detail, assuming my test program has a .cabal file, I run:

cabal sandbox init
cabal install -p --ghc-options=' -auto-all -caf-all -rtsopts -threaded'
cabal exec PROGRAM -- +RTS -p

The .cabal file for the test program should have ghc-options: -prof

Then look at ./PROGRAM.prof.

bscarlet commented 8 years ago

Test.hs.txt

Here's the simple code I'm using. Just replacing diagrams with cairo makes it much faster. If the problem is the font loading, then I think there may be two problems: 1 - a small issue, that the simple toFile api seems to default to reloading the fonts each time, if reused 2 - the font loading itself might be unnecessarily slow

bscarlet commented 8 years ago

I tried using loadSansSerifFonts, then renderToSVGString' instead of toFile, so that I could separate the loading of the fonts out of the path with which I'm concerned. Although less so, renderToSVGString' is still notably slower than using cairo.

timbod7 commented 8 years ago

@bscarlet Did you have any success profiling to see where the time was being spent?

bscarlet commented 8 years ago

No, I haven't had time to pursue it. Thanks for checking in. I remain interested in improvement in this area, but it's unlikely I'll be able to get to doing it myself.

sid-kap commented 8 years ago

I started trying to optimize the font loading. Font loading is definitely the slowest part of the diagrams backend.

I investigated a few ways to make this faster. For a simple dotplot program that loads only the sans-serif fonts, the program takes 1.8s. By adding print statements, I discoved that most of the time is taken by loading and parsing the SVG fonts.

I experimented with file-embed to embed the SVG font files directly in library. I assumed that this might make it faster since the mmap that the OS performs to load a binary into memory is probably faster than LazyByteString.readFile. (This involved adding a function loadFont' :: ByteString -> PreparedFont to SVGFonts, since the existing loadFont function takes a FilePath and thus doesn't accept embedded ByteStrings. I've just sent a PR for this change.) This change brought my example program down from 1.8s to 0.7s.

0.7s is still quite slow, and I think it would be beneficial to now profile the XML parsing step. Ideally, we should pre-parse the XMLs and serialize the font data to a file (or embed it using file-embed, as above). This would eliminate the parsing step completely. Unfortunately, many of the important datatypes in the Font type are diagrams types that don't have Binary/Serial/Serializable instances. I wonder if it would be possible to make Binary/Serial/Serializable instances for these types.

timbod7 commented 8 years ago

Nice work.

Though you have got me thinking.. I wonder if the problem could best be addressed with an architectural change.

The original backend was the cairo one, which does rendering in the IO monad. This means that it is free to load fonts on demand. Given that constructing a diagram is a pure operation, we put a fair bit of work into producing charts purely with diagrams. A consequence of this is that fonts must be preloaded.

The pure operation to render a chart with diagrams is:

renderableToSVG' :: Renderable a -> DEnv Double -> (Svg.Svg (), PickFn a)

data DEnv n = DEnv
  { envAlignmentFns :: AlignmentFns     -- ^ The used alignment functions.
  , envFontStyle :: FontStyle           -- ^ The current/initial font style.
  , envSelectFont :: FontStyle -> F.PreparedFont n -- ^ The font selection function.
  , envOutputSize :: (n,n)              -- ^ The size of the rendered output.
  , envUsedGlyphs :: M.Map (String, FontSlant, FontWeight) (S.Set String)
  }

Perhaps this should instead be monadic after all, something like:

renderableToSVG' :: (Monad m) => Renderable a -> DEnv m Double -> m (Svg.Svg (), PickFn a)

data DEnv m n = DEnv
  { envAlignmentFns :: AlignmentFns     -- ^ The used alignment functions.
  , envFontStyle :: FontStyle           -- ^ The current/initial font style.
  , envSelectFont :: FontStyle ->  m (F.PreparedFont n) -- ^ The font selection function.
  , envOutputSize :: (n,n)              -- ^ The size of the rendered output.
  , envUsedGlyphs :: M.Map (String, FontSlant, FontWeight) (S.Set String)
  }

That way the API could still be used purely, with preloaded fonts, but could also use the IO monad, and load fonts on demand?

sid-kap commented 8 years ago

Is polluting the function with IO really necessary here? I imagine there should be a way to define a pure function envSelectFont that contains thunks to FontData objects that are defined in terms of lazy IO, so that the fonts aren't loaded unless they are needed. (In fact, why doesn't this already happen? loadFont' is already defined in terms of LazyByteString.readIO.) The current implementation, though, clearly doesn't do this; I observed that reducing the number of fonts to load dramatically decreases the time that the function takes.

Also, while this is an improvement, I feel like this is essentially penalizing people for using more fonts. In my opinion, using extra fonts should be a core functionality of the library, and there shouldn't be a 0.5s punishment for every extra font the user wants to load. There should be a better solution. I think it would be great if we could somehow store the actual Diagrams internal representation of a font inside the binary, as I described above. (Maybe @bergey might have ideas about whether this is possible/a good idea?) Maybe we could create a new package on Hackage that contains Diagrams representations of lots of fonts from around the web, so if someone wants to use a font they can simply import that package. I wonder how matplotlib, for example, deals with this. Maybe their SVG parser runs faster, or maybe their internal representation of fonts are more SVG-like and thus take less work to parse.

timbod7 commented 8 years ago

Is polluting the function with IO really necessary here?

I just think it's being honest :-) I'm not much of a fan of lazy IO. The user may have hundreds of fonts on their system, any of which may be be referenced. I don't think it's reasonable that this font resolution process be hidden behind lazy IO.

With my suggestion the diagrams backend could work two ways. Purely, where you provided preloaded fonts (embedded in your client program as you suggest), or via IO where it loads fonts on demand from those available on the machine.

I think it would be great if we could somehow store the actual Diagrams internal representation of a font inside the binary, as I described above.

Yes - though different users are going to want different sets of fonts. Hence two modes would be desirable.

wonder how matplotlib, for example, deals with this

It definitely seems like it loads on demand

bergey commented 8 years ago

I think making IO visible in the types of font loading functions would be helpful.