rust-random / rand

A Rust library for random number generation.
https://crates.io/crates/rand
Other
1.6k stars 423 forks source link

Generate or include graphs for distributions in docs #131

Open abonander opened 7 years ago

abonander commented 7 years ago

Those of us who aren't as well versed in probability theory (and that includes me), may not be able to immediately intuit the main properties of the distribution functions from their density functions.

I'm talking about something like this graph that Wikipedia has for the normal/gamma distribution article, either linked or inlined in the docs for the various relevant types in the distributions module.

These graphs are probably available online under compatible licenses already, possibly in the whitepapers linked under the various types in this module. I haven't looked yet, but I don't think we can extract these graphs anyways as they are probably unlicensed or their copyright terms are not compatible with MIT/Apache-2.0.

jacwah commented 7 years ago

I think this is a great idea and am currently looking into possible solutions.

jacwah commented 7 years ago

First, I tried searching for some graphs available online. Quickly I realised a few problems.

  1. Most graphs available online don't have a compatible license (Creative Commons etc).
  2. I couldn't find a complete set of probability density graphs in the same style.
  3. They aren't tailored to this project's needs.

I think the graphs should be

Therefore I decided to try writing a script myself using Python. Matplotlib has great graphing support and Scipy has a wide range of probability density functions out of the box.

Chi-squared probability density

This was generated from the following code:

import numpy as np
from scipy.stats import chi2
from graphbutler import recipe, save_all, Graph, Parameterized

@recipe
def chi_squared_pdf():
    g = Graph()
    g.title = "Chi-squared probability density"
    g.x = np.arange(0.0, 9.0, 0.01)

    def y(k):
        y = chi2.pdf(g.x, k)
        # Value threshold because k=1 is unbounded
        y[y > 0.5] = np.nan
        return y

    g.y = Parameterized("k", y, (1, 2, 3, 5, 9))
    return g

if __name__ == "__main__":
    save_all(format="png")

Graphbutler is a simple wrapper around matplotlib I wrote to reduce boilerplate code.

I could write a recipe for each of the distributions in this crate. It should then be easy to tweak and extend with potential new distributions.

It's currently hard to include local images in rustdoc (issue). Therefore, I suggest the rendered graphs are hosted on the internet instead of in the crate itself.

What do you think?

jacwah commented 7 years ago

Then there are cumulative distribution functions. Are those interesting as well? Personally, I find them harder to grasp.

erickt commented 7 years ago

@steveklabnik: this sounds like a docs-related thing. Do you have a good approach to getting these kinds of generated graphs in our docs?

MichaelOwenDyer commented 2 months ago

@dhardy Do you have any opinions regarding the best place to put these graphs? From reading into the conversation above these options are immediately apparent to me:

  1. Store them in a folder in the git repo (which would significantly increase clone size)
  2. Host them somewhere else and just put web links in the documentation. This wouldn't affect clone size, but would introduce coupling between the docs and an external host which would need to be maintained, also an internet connection would be required to view them at any time.
  3. Embed them directly into the docs (either Base64 encoded or in SVG format, which I just learned is valid HTML 🤠). This might be a good solution depending on just how much bloat it would add to the docs (even if it rendered nicely, hundreds of lines of gobbledygook in nearly every file would be sad indeed).

Do you know of any best practices here, or of any other crates that have done something like this which we could take inspiration from?

Also, any opinions about storing the (presumably Python) code which generates these diagrams, perhaps for future use or reproducibility?

dhardy commented 2 months ago

In my opinion, documentation should be stored within the repository, and these are documentation. The exception is things like tutorials and books which are more prose.

So, I can see a few possible options:

  1. We just link to external documentation such as Wikipedia. We already do this in some cases. I've nothing against the links, but I guess this issue is about having our own resource.
  2. We expand the book with more information on our distributions (like GSL or Python), also including graphs. This lets us group related distributions and include more prose than in-line docs while giving us a uniform look at distributions.
  3. We add plots to API documentation. More scrolling will be needed to see the API, but as long as it's short I guess that's fine.

If we go for API docs, we should add a sub-dir in the repo like rand_distr/res/plots. Either way, I personally think scalable (SVG) graphics are preferable for this type of thing; example.

dhardy commented 2 months ago

Embed them directly into the docs (either Base64 encoded or in SVG format, which I just learned is valid HTML 🤠). This might be a good solution depending on just how much bloat it would add to the docs (even if it rendered nicely, hundreds of lines of gobbledygook in nearly every file would be sad indeed).

Also horrible for diffs any time a plot is edited. No thanks!

dhardy commented 2 months ago

Also, any opinions about storing the (presumably Python) code which generates these diagrams, perhaps for future use or reproducibility?

If it's a tiny amount of code, then in the same repository (we probably also want to store the output rather than require the build job regenerate them, although minimalism says otherwise).

If it's a lot of code, we can use a new repository under https://github.com/rust-random/

MichaelOwenDyer commented 2 months ago

I'm working on a PR right now. I wrote some Python code with numpy, scipy, and matplotlib to generate diagrams for different distributions (roughly 20 lines of code per distribution so far) and now I'm trying to use the embed-doc-image crate to inject these images as Base64 into the documentation during compilation. This last step is proving somewhat tricky, it seems the macro doesn't expand to include a div like it should. Still investigating, might have to open a PR in that crate first

dhardy commented 2 months ago

I would suggest simply linking the images instead of embedding. The catch is that building docs will then require copying these into target/doc in local builds and in .github/workflows/gh-pages.yml.

MichaelOwenDyer commented 2 months ago

So, do you mean something like this https://github.com/rust-lang/rust/issues/32104#issuecomment-440664615? I think it would be really nice if the diagrams would be viewable on docs.rs as well, and if I understand correctly then that would not be possible without embedding. But to be honest I'm open to just putting the images into a directory, referencing them in the docs, and calling it a day. This certainly isn't the most streamlined thing to do with rustdoc.

dhardy commented 2 months ago

You're right; it's not quite so simple for docs.rs. We could possibly get around this by copying image resources into $OUT_DIR/doc using build.rs and use relative links (assuming docs.rs packages everything in doc; I don't know).

The issue you linked discusses some other options. embed_doc_image may be a good choice? Sorry, you already mentioned this...