Open MichaelOwenDyer opened 2 months ago
Need to look into it more, but maybe whatever solution we decide on here could also be used for https://github.com/rust-random/rand/issues/278
However, this also adds a transitive dependency to syn, which is not great.
Not an issue: you already feature-gated it.
The diagrams are generated with Python code, which is included in the PR for reference.
Good to have in the same repo, but it doesn't need to be in the generated crate package. Could you try investigating why this happens? We already have an explicit include
rule in the Cargo.toml
, which seems like it should be enough.
We should also talk about the best place to put the diagrams (currently there is one assets folder for rand and another for rand_distr, but this might not work).
We don't package any of these files except by embedding, so we don't need an assets
dir and we don't need these separated by crate, but I prefer that the generated resources are next to the generating script. Having the assets in crate-specific dirs but not the script is especially weird. Maybe just move diagrams/src
to assets/src
, also splitting by crate?
Or, simpler: forget Bernoulli (doesn't really need it) and put all the rest under rand_distr/plots
. (Also solves my nit below.)
X-axis values should be false, true; not 0, 1.
Can we add a second parameterisation, e.g. n=6, p=0.5?
:+1: (straight copy from WP, I see)
I think SVG would be a better fit (separate files, not embedded into HTML). You should be able to generate the same plots with Python without issues.
@newpavlov you missed the discussions in #131 (specifically, how to make this work on docs.rs).
SVG appears larger in some cases, smaller in others. It does scale better.
Ah, I misread the discussion and thought it was about embedding SVG into HTML. I don't think the terrible diff objection is important, plus we probably should use compression either way (i.e. use SVGZ instead of plain SVG). Though there may be some issues with browser support of SVGZ files.
In RustCrypto we use links to images which reside in the RustCrypto/media repository, e.g. see docs for the ctr
crate. It means that we rely on raw.githubusercontent.com instead of packaging the image together with crate, but since the images are purely for illustrative purposes, I think it's a reasonable risk to take.
Good to have in the same repo, but it doesn't need to be in the generated crate package. Could you try investigating why this happens? We already have an explicit include rule in the Cargo.toml, which seems like it should be enough.
I think it may be because the folder was named src
. I've moved it and renamed it to py
, could you please check again?
forget Bernoulli (doesn't really need it) and put all the rest under rand_distr/plots
Got it. That also removes the need for the added dependency on rand
, now it only remains in rand_distr
.
Binomial Can we add a second parameterisation, e.g. n=6, p=0.5?
I went ahead and added n=10, p=0.6 because I think it looks better next to n=10, p=0.2. If you think having a demonstration of a different n
is important, then I could experiment some more. Maybe something like n=20, p=0.5 would look better.
I think SVG would be a better fit (separate files, not embedded into HTML). You should be able to generate the same plots with Python without issues.
I prefer SVG as well. However, I just did the switch and the diff count exploded to +11,000. Wowza.
The Python script folder should probably be explicitly excluded from packaged crate (check if it gets included by running cargo package
).
It also may be worth to consider the separate repo approach described above. It will be a simpler solution than relying on embed-doc-image
and will result in a smaller package size. I will leave the final decision to @dhardy.
The Python script folder should probably be explicitly excluded from packaged crate (check if it gets included by running cargo package).
Ah, thanks for the tip! According to this include
and exclude
should not be used at the same time. But I found the problem, had to change src/
to /src
:)
It also may be worth to consider the separate repo approach described above. It will be a simpler solution than relying on embed-doc-image and will result in a smaller package size. I will leave the final decision to @dhardy.
Certainly. I'm not attached to embedding them if we find a better solution.
I prefer SVG as well. However, I just did the switch and the diff count exploded to +11,000. Wowza.
Diffs normally ignore binary files; this is just because SVG is text based? If so, switching to SVGZ may avoid this?
had to change
src/
to/src
Aha!
It also may be worth to consider the separate repo approach described above.
The only real issue is versioning. I see usage of commit hashes in the URLs that ctr
uses; this or tags works.
There is also the minor issue that repo clones don't include all docs, but we already have that with the book.
The other issue is that anyone trying to make themselves a copy of the docs for offline usage likely won't have the plots, but that's not critical.
Considering these plots aren't directly tied to the Rust code in any way, a separate repo probably is the way to go.
Diffs normally ignore binary files; this is just because SVG is text based? If so, switching to SVGZ may avoid this?
Yes, SVGZ is better in this regard. However embed-doc-image doesn't support it.
a separate repo probably is the way to go.
Okay then! Want to create it?
Okay then! Want to create it?
Can you authorize me to push in that repo, or should I make a fork?
@MichaelOwenDyer you should already have admin privileges on that repo.
Hm, strange, I am getting a 403 whenever I try to push.
Edit: Nevermind, I logged in using a GitHub token and then it worked.
Diffs normally ignore binary files; this is just because SVG is text based?
We can tell Git via attributes to treat the files as binary if necessary.
Hi @dhardy @newpavlov @vks
Sorry for the huge delay in getting this PR done, I've had a lot on my plate at uni. This is ready for review now, please take a look through the docs and see if you can spot anything that needs further improvement. Looking forward to finally getting this merged!
@dhardy, could you take a look at the docs for the Geometric distribution and tell me if you think the plot is consistent? I need a sanity check. The docs describe "the number of failures before the first success in a series of Bernoulli trials", but the x-axis on the plot starts at 1, which I think represents the number of trials before the first success (also counting the success itself). I think the plot should start at 0 for consistency with the docs and the actual implementation, no?
Also, is there a reason we use σ for the scale parameter in Gumbel instead of β which I see in the literature?
@dhardy, could you take a look at the docs for the Geometric distribution and tell me if you think the plot is consistent?
The way Wikipedia puts it is that there are two definitions of the distribution. The code and the PMF count the number of failures, while your plot and description count until the first success.
Also, is there a reason we use σ for the scale parameter in Gumbel instead of β which I see in the literature?
That's presumably an error in the documentation since the code just uses scale
internally.
The way Wikipedia puts it is that there are two definitions of the distribution.
Okay, yes I saw that too. I will adjust the x-axis of the plot to start at 0.
That's presumably an error in the documentation since the code just uses scale internally.
I'll fix it then 🤠
Another thing I noticed: We are using a
in the documentation and implementation for the Zeta distribution, but s
is what I am seeing online. Any reason for this?
Also, should I make an effort to embed some more links to Wikipedia across the documentation? Some already exist.
Another thing I noticed: We are using a in the documentation and implementation for the Zeta distribution, but s is what I am seeing online. Any reason for this?
@vks implemented this (#1136). @vks?
Also, should I make an effort to embed some more links to Wikipedia across the documentation? Some already exist.
WP is a useful resource; I guess this would be convenient for some people but not important.
@MichaelOwenDyer Can you be more specific? The notation and naming are not very consistent across the literature. For example, what we call Zeta is sometimes called Zipf, or vice versa.
@vks I'm referring to the struct at zipf::Zeta
. zipf::Zipf
uses s
already, but zipf::Zeta
uses a
. I didn't see anything online which lists the Zeta parameter as a
, see wikipedia: https://en.wikipedia.org/wiki/Zeta_distribution
I guess the a
comes from the reference, but I no longer have access to it to check. I think I did not want to change it in the implementation to avoid getting confused when implementing the algorithm from the paper.
I'm fine with changing it to s
now, as you say it's more consistent with our Zipf
docs, and the usual parameter name chosen for the Riemann zeta function.
Okay. We may want to do that in a different PR since ZetaError::ATooSmall is pub, and I believe renaming that would be a breaking change.
CHANGELOG.md
entrySummary
This PR adds some graphs to the documentation of different distributions in
rand_distr
to illustrate their behavior.Motivation
Make the distributions in this crate easier to understand via visual aids.
Details
The diagrams will be hosted in https://github.com/rust-random/charts (see this PR) and links will be inserted into the documentation.
Closes https://github.com/rust-random/rand/issues/131.
Documentation Progress:
binomial::Binomial
cauchy::Cauchy
dirichlet::Dirichlet
exponential::Exp
exponential::Exp1
frechet::Frechet
gamma::Gamma
gamma::ChiSquared
gamma::FisherF
gamma::StudentT
gamma::Beta
geometric::Geometric
geometric::StandardGeometric
gumbel::Gumbel
hypergeometric::Hypergeometric
inverse_gaussian::InverseGaussian
normal::Normal
normal::StandardNormal
normal::LogNormal
normal_inverse_gaussian::NormalInverseGaussian
pareto::Pareto
pert::Pert
poisson::Poisson
skew_normal::SkewNormal
triangular::Triangular
unit_ball::UnitBall
unit_circle::UnitCircle
unit_disc::UnitDisc
unit_sphere::UnitSphere
weibull::Weibull
zipf::Zeta
zipf::Zipf