rust-random / charts

Graphical representations of distributions
Apache License 2.0
0 stars 0 forks source link

Add diagrams for `rand_distr` #1

Closed MichaelOwenDyer closed 1 month ago

MichaelOwenDyer commented 3 months ago

Summary

This PR adds some plots to the documentation of distributions in rand_distr to illustrate their behavior.

Motivation

Make the distributions in rand easier to understand via visual aids.

Details

The distributions folder contains a separate Python file for each plot being created. main.py is a script which saves all of the distribution plots into the charts folder.

Used by https://github.com/rust-random/rand/pull/1434.

Charts ready for review:

newpavlov commented 3 months ago

Unfortunately, it does not look like browsers can open SVGZ files (tested in Firefox and Chromium) without messing with Content-Encoding, so we probably have to use plain SVGs. Fortunately, compression should be applied by default while transferring the images over HTTP(S), so the only drawback will be larger diffs and size of the repository.

newpavlov commented 3 months ago

BTW it may be worth to change the license from Apache 2.0 to CC-BY or CC-0.

dhardy commented 3 months ago

For the most part these plots look good.

Can you increase the default size of the SVGs? At lest, opening these stand-alone, they are small.

dhardy commented 3 months ago

@newpavlov can you open a new issue regarding the licence? Also, we need a README, but not in this PR.

MichaelOwenDyer commented 3 months ago

I have a few diagrams I know I still need to do / fix, which I'll work on today. I suggest the following: I will tick the checkboxes of any diagrams which are ready for review, and you (or I) can uncheck them again if there is potential for improvement 🤠

MichaelOwenDyer commented 3 months ago

I also want to be sure, since these charts will be in the documentation, that they are consistent with the terminology used in the documentation with respect to naming (and potentially also examples). For instance, I noticed that the documentation for the Chi squared distribution calls the degrees of freedom parameter k and not df.

MichaelOwenDyer commented 3 months ago

Also, I should ask: do we want plots of cumulative distribution functions as well, or would this be excessive? The workload would not be much more, and I'd be willing.

dhardy commented 3 months ago

Wikipedia does often show CDFs as well as PDFs, but since one is a simple translation of the other I don't think there's much value in both. @vks thoughts?

MichaelOwenDyer commented 2 months ago

Thanks for the review. I've just started my Master's so I haven't had quite as much time to work on this but I'll implement your feedback and hopefully finally figure out how to get the Dirichlet code to work.

MichaelOwenDyer commented 2 months ago

@dhardy I've updated the diagrams which you gave feedback on, have a look:

Also, I added the Dirichlet distribution. I couldn't figure out how to do it on my own, so I eventually gave up and used some code I found on the internet, which I cited in dirichlet.py. The plot looks a little different from the others, but I think it's not too bad. Also, I uploaded it in png format because the svg file was 22MB.

MichaelOwenDyer commented 2 months ago

Thanks for the review @newpavlov. I will add grid lines. With your second point you are referring to the white margin between the axes and the actual plot right? I also considered removing this at one point. I recall that it made a plot a little bit harder to read, but I'll give it another try for all of them and we can decide together what looks nicest. I'll see about adding axes and a legend to the Dirichlet plot. What do you mean with your last point - that the Rust implementation should be generating the plots, not separate Python code? I do agree with that, but this is at least a start.

dhardy commented 2 months ago
dhardy commented 2 months ago

What do you mean with your last point - that the Rust implementation should be generating the plots, not separate Python code? I do agree with that, but this is at least a start.

Unlike scipy and statrs, our implementations do not implement PDF functions so this is not possible (aside from stochastically). We could do something like this later, but it falls under the title of "testing distributions", not "generating nice plots" (in my opinion).

MichaelOwenDyer commented 1 month ago

Sorry for the inactivity, I am still working on getting proper axes on the Dirichlet plot but have been so busy lately... will get it done soon

MichaelOwenDyer commented 1 month ago

Thanks for your patience @dhardy, I fixed the Dirichlet plot and I think it is ready to merge now :)

MichaelOwenDyer commented 1 month ago

I will merge it now, this PR has been open for long enough. Of course if we find any issues I will fix them in a follow-up

dhardy commented 1 month ago

Excellent! Thank you for these plots @MichaelOwenDyer.