sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 480 forks source link

add discrete Gaussian samplers to Sage #15915

Closed malb closed 10 years ago

malb commented 10 years ago

This ticket adds samplers for discrete Gaussian distributions to Sage.

Component: statistics

Keywords: sd59

Author: Martin Albrecht

Branch: a1b711f

Reviewer: Julian Rueth

Issue created by migration from https://trac.sagemath.org/ticket/15915

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

8ebe5c7expanded documentation
malb commented 10 years ago
comment:33

Based on discussions locally at SD59 and the latest check in I am setting this back to needs review.

saraedum commented 10 years ago

Changed branch from u/malb/t15915_discrete_gaussians to u/saraedum/ticket/15915

saraedum commented 10 years ago
comment:35

A few questions (line numbers are all in the version I pushed):

saraedum commented 10 years ago

Changed branch from u/saraedum/ticket/15915 to u/malb/t15915_discrete_gaussians

saraedum commented 10 years ago

Author: Martin Albrecht

saraedum commented 10 years ago

Reviewer: Julian Rueth

saraedum commented 10 years ago

Changed branch from u/malb/t15915_discrete_gaussians to u/saraedum/ticket/15915

saraedum commented 10 years ago

Changed commit from 8ebe5c7 to 1939239

malb commented 10 years ago
comment:38

Replying to @saraedum:

A few questions (line numbers are all in the version I pushed):

  • Should it really be x-2 in line 518 of dgs_gauss.h?

It should read x-c, fixed.

  • What is the TODO in dgs_gauss_dp.c?

This was a left-over, removed.

  • self->c_r = 0.0; after an abort() in line 114 and 130 of dgs_gaus_dp.c?

Removed.

  • What is "this B" in line 186 of discrete_gaussian_lattice.py?

self.B, fixed this.


New commits:

da3efcffixed issues identified by Julian Rüth
malb commented 10 years ago

Changed branch from u/saraedum/ticket/15915 to u/malb/t15915_discrete_gaussians

malb commented 10 years ago

Changed commit from 1939239 to da3efcf

vbraun commented 10 years ago
comment:40

PDF docs don't build

! Missing $ inserted.
<inserted text> 
                $
l.2746 

? 
! Emergency stop.
<inserted text> 
                $
l.2746 

in ./src/doc/output/latex/en/reference/stats/stats.log

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

025ac61the future isn't now: removed some utf-8 stuff + fixed latex error
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from da3efcf to 025ac61

malb commented 10 years ago
comment:42

Fixed. Setting it back to "positive review" (?)

vbraun commented 10 years ago

Changed branch from u/malb/t15915_discrete_gaussians to 025ac61

vbraun commented 10 years ago
comment:44

Various builbots report:

sage -t --long src/sage/stats/distributions/discrete_gaussian_integer.pyx
**********************************************************************
File "src/sage/stats/distributions/discrete_gaussian_integer.pyx", line 263, in sage.stats.distributions.discrete_gaussian_integer.DiscreteGaussianIntegerSampler.__init__
Failed example:
    min(l), max(l), abs(mean(l)-2.5) < 0.01
Expected:
    (-2, 7, True)
Got:
    (-3, 7, True)

Also, on 32-bit linux:


sage -t --long src/sage/stats/distributions/discrete_gaussian_integer.pyx
**********************************************************************
File "src/sage/stats/distributions/discrete_gaussian_integer.pyx", line 38, in sage.stats.distributions.discrete_gaussian_integer
Failed example:
    x=0; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
Expected:
    (13350, 13298)
Got:
    (13355, 13298)
**********************************************************************
File "src/sage/stats/distributions/discrete_gaussian_integer.pyx", line 40, in sage.stats.distributions.discrete_gaussian_integer
Failed example:
    x=4; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
Expected:
    (5543, 5467)
Got:
    (5479, 5467)
**********************************************************************
File "src/sage/stats/distributions/discrete_gaussian_integer.pyx", line 42, in sage.stats.distributions.discrete_gaussian_integer
Failed example:
    x=-10; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
Expected:
    (46, 51)
Got:
    (53, 51)
**********************************************************************
File "src/sage/stats/distributions/discrete_gaussian_integer.pyx", line 68, in sage.stats.distributions.discrete_gaussian_integer
Failed example:
    mean(l).n() # long time
Expected:
    0.48678000000...
Got:
    0.486650000000000
**********************************************************************
1 item had failures:
   4 of  21 in sage.stats.distributions.discrete_gaussian_integer
    [88 tests, 4 failures, 7.06 s]
sage -t --long src/sage/schemes/toric/variety.py
    [499 tests, 9.23 s]
sage -t --long src/sage/stats/distributions/discrete_gaussian_polynomial.py
**********************************************************************
File "src/sage/stats/distributions/discrete_gaussian_polynomial.py", line 70, in sage.stats.distributions.discrete_gaussian_polynomial.DiscreteGaussianPolynomialSampler
Failed example:
    [gs() for _ in xrange(3)]
Expected:
    [4*x^7 + 4*x^6 - 2*x^5 + x^4 + 4*x^2 - 2*x + 7, 5*x^7 - 4*x^6 + 3*x^4 - 4*x^3 + x^2 - 4, 2*x^7 + 2*x^6 + x^5 + 2*x^3 - 3*x^2 + x]
Got:
    [4*x^7 + 4*x^6 - 4*x^5 + 2*x^4 + x^3 - 4*x + 7, -5*x^6 + 4*x^5 - 3*x^3 + 4*x^2 + x, 2*x^7 + 2*x^6 + 2*x^5 - x^4 - 2*x^2 + 3*x + 1]
**********************************************************************
File "src/sage/stats/distributions/discrete_gaussian_polynomial.py", line 97, in sage.stats.distributions.discrete_gaussian_polynomial.DiscreteGaussianPolynomialSampler.__init__
Failed example:
    [gs() for _ in xrange(3)]
Expected:
    [4*x^7 + 4*x^6 - 2*x^5 + x^4 + 4*x^2 - 2*x + 7, 5*x^7 - 4*x^6 + 3*x^4 - 4*x^3 + x^2 - 4, 2*x^7 + 2*x^6 + x^5 + 2*x^3 - 3*x^2 + x]
Got:
    [4*x^7 + 4*x^6 - 4*x^5 + 2*x^4 + x^3 - 4*x + 7, -5*x^6 + 4*x^5 - 3*x^3 + 4*x^2 + x, 2*x^7 + 2*x^6 + 2*x^5 - x^4 - 2*x^2 + 3*x + 1]
**********************************************************************
2 items had failures:
   1 of   5 in sage.stats.distributions.discrete_gaussian_polynomial.DiscreteGaussianPolynomialSampler
   1 of   5 in sage.stats.distributions.discrete_gaussian_polynomial.DiscreteGaussianPolynomialSampler.__init__
    [18 tests, 2 failures, 0.76 s]

Also some crypto stuff failed, though I don't know if it is related

4 items had failures:
   4 of  15 in sage.crypto.lwe
   1 of   7 in sage.crypto.lwe.RingLWE.__call__
   2 of   7 in sage.crypto.lwe.balance_sample
   3 of   7 in sage.crypto.lwe.samples
    [100 tests, 10 failures, 0.60 s]

See http://build.sagemath.org/sage/builders/%20%20fast%20Oxford%20arando%20%28Ubuntu%2013.04%20i686%29%20incremental/builds/172/steps/shell_4/logs/stdio

vbraun commented 10 years ago

Changed commit from 025ac61 to none

malb commented 10 years ago
comment:45

Replying to @vbraun:

Various builbots report:

sage -t --long src/sage/stats/distributions/discrete_gaussian_integer.pyx
**********************************************************************
File "src/sage/stats/distributions/discrete_gaussian_integer.pyx", line 263, in sage.stats.distributions.discrete_gaussian_integer.DiscreteGaussianIntegerSampler.__init__
Failed example:
    min(l), max(l), abs(mean(l)-2.5) < 0.01
Expected:
    (-2, 7, True)
Got:
    (-3, 7, True)

On what kind of machine is this on? I cannot reproduce it on either 64 or 32 bit Linux.

Also, on 32-bit linux:

I can reproduce those and am currenly testing a fix (some default value was machine dependent).

Also some crypto stuff failed, though I don't know if it is related

4 items had failures:
   4 of  15 in sage.crypto.lwe
   1 of   7 in sage.crypto.lwe.RingLWE.__call__
   2 of   7 in sage.crypto.lwe.balance_sample
   3 of   7 in sage.crypto.lwe.samples
    [100 tests, 10 failures, 0.60 s]

These are related, thanks.

malb commented 10 years ago
comment:46

I should have fixed the 32-bit issues.

I would guess these additional patches need review again as they are not mere documentation/doctest fixes, but changes to the code.

I set the ticket to needs info because the first doctest failure is still unaccounted for (see above).


Last 10 new commits:

a17b818follow sage conventions for the author list
7842276added long time to long running doctests
1939239added sig_on()/sig_off() to dgs calls
da3efcffixed issues identified by Julian Rüth
025ac61the future isn't now: removed some utf-8 stuff + fixed latex error
85fd777discrete_gaussian_integer depends on header files
bae27bddefault choice is should be machine independent
e791536all bern functions should return long only
61359b0C documentation fix
2c11c5edoctest fixes due to changed code
malb commented 10 years ago

Commit: 2c11c5e

malb commented 10 years ago

Changed branch from 025ac61 to u/malb/t15915_discrete_gaussians

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

bc732a2fixed silly typo
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from 2c11c5e to bc732a2

malb commented 10 years ago
comment:48

Hi Volker, any idea how I could reproduce that first doctest failure? I don't get it on the 64-bit and 32-bit Linux systems I tried.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

27af749Merge branch 'develop' of trac.sagemath.org:sage into u/malb/t15915_discrete_gaussians
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from bc732a2 to 27af749

malb commented 10 years ago
comment:50

Replying to @malb:

Hi Volker, any idea how I could reproduce that first doctest failure? I don't get it on the 64-bit and 32-bit Linux systems I tried.

It seems my question slipped through the cracks.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from 27af749 to caa8ccf

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

92e61fdMerge branch 'develop' of trac.sagemath.org:sage into u/malb/t15915_discrete_gaussians
caa8ccfMerge branch 'develop' of trac.sagemath.org:sage into u/malb/t15915_discrete_gaussians
vbraun commented 10 years ago

Changed branch from u/malb/t15915_discrete_gaussians to caa8ccf

vbraun commented 10 years ago
comment:54

I got this failure now on arando (32-bit linux). It worked on the previous test run, so the failure is non-deterministic. Try running tests in a loop to reproduce.

sage -t --long src/sage/stats/distributions/discrete_gaussian_integer.pyx
**********************************************************************
File "src/sage/stats/distributions/discrete_gaussian_integer.pyx", line 251, in sage.stats.distributions.discrete_gaussian_integer.DiscreteGaussianIntegerSampler.__init__
Failed example:
    min(l) == 0-2*1.0, max(l) == 0+2*1.0, abs(mean(l)) < 0.01
Expected:
    (True, True, True)
Got:
    (True, True, False)
**********************************************************************
1 item had failures:
   1 of  48 in sage.stats.distributions.discrete_gaussian_integer.DiscreteGaussianIntegerSampler.__init__
    [88 tests, 1 failure, 7.14 s]
vbraun commented 10 years ago

Changed commit from caa8ccf to none

malb commented 10 years ago
comment:55

Hi, thanks for providing this. I've just committed a "fix". With precision='dp' results are not reproducible (our random seed does not control C's random()). I clarified this a bit more in the documentation and increased the allowed bound. Unfortunately, on the only 32-bit machine I have access to, I cannot reproduce the doctest failure, so I can ask you to run it again?

nbruin commented 10 years ago
comment:56

The name DiscreteGaussianIntegerSampler might be a bit confusing to some people: at first reading it seems you're sampling from the Gaussian integers (which are discrete).

Wouldn't DiscreteGaussianDistributionSampler be a more descriptive name? I get the impression that Discrete Gaussian Distribution is normally used for the distributions on the integers you're referring to.

The fact that you truncate at tau*sigma also seems to disqualify this from legitimately being called "sampling from a discrete gaussian distribution", but I hope this is standard abuse of terminology in the field (and the difference is hard to see).

It also suggests that if the rejection rate is high (it would be for larger values of tau) it might be worth sampling from a binomial distribution with adapted rejection rates. The shape should be much closer to your desired distribution and hence should lead to a much lower rejection rate. It depends a bit on whether the new rejection probabilities are sufficiently easy to compute to be worth it.

malb commented 10 years ago
comment:57

Replying to @nbruin:

The name DiscreteGaussianIntegerSampler might be a bit confusing to some people: at first reading it seems you're sampling from the Gaussian integers (which are discrete). Wouldn't DiscreteGaussianDistributionSampler be a more descriptive name? I get the impression that Discrete Gaussian Distribution is normally used for the distributions on the integers you're referring to.

The name was chosen to allow for samplers over lattices and polynomial rings as well: DiscreteGaussianLatticeSampler and DiscreteGaussianPolynomialSampler.

I can change it to DiscreteGaussianDistributionIntegerSampler or something if I get a commitment that this change is reviewed in a timely fashion. This patch has had a hard time getting reviewed so I am not opening it up again lightly.

The fact that you truncate at tau*sigma also seems to disqualify this from legitimately being called "sampling from a discrete gaussian distribution", but I hope this is standard abuse of terminology in the field (and the difference is hard to see).

The difference is hard to see due to the tail bound. Roughly: Pr[x <- D | |x| > Csigma] <= 2/(C sqrt(2pi)) * exp(-C2/2). This is also the standard approach in the literature.

It also suggests that if the rejection rate is high (it would be for larger values of tau) it might be worth sampling from a binomial distribution with adapted rejection rates. The shape should be much closer to your desired distribution and hence should lead to a much lower rejection rate. It depends a bit on whether the new rejection probabilities are sufficiently easy to compute to be worth it.

Sampling from other distributions has been considered in the literature. The "sigma2+logtable" algorithm is a result of such an investigation in:

[DDLL13] Léo Ducas, Alain Durmus, Tancrède Lepoint and Vadim Lyubashevsky. Lattice Signatures and Bimodal Gaussians; in Advances in Cryptology – CRYPTO 2013; Lecture Notes in Computer Science Volume 8042, 2013, pp 40-56 http://www.di.ens.fr/~lyubash/papers/bimodal.pdf

8383c1ad-6c4a-47ff-af65-59b1e04c9793 commented 10 years ago
comment:58

Running sage -t --global-iterations 100 src/sage/stats/distributions/discrete_gaussian_integer.pyx on 32-bit Archlinux passes.

Here are the last circa 210 lines of output in abbreviated form:

...
sage -t src/sage/stats/distributions/discrete_gaussian_integer.pyx
   [82 tests, 16.02 s]
----------------------------------------------------------------------
Doctests interrupted: 1/1 files tested
Doctests interrupted: 2/1 files tested
...
Doctests interrupted: 98/1 files tested
Doctests interrupted: 99/1 files tested
----------------------------------------------------------------------
Total time for all tests: 16.1 seconds
    cpu time: 1625.9 seconds
    cumulative wall time: 1627.1 seconds
sage -t src/sage/stats/distributions/discrete_gaussian_integer.pyx
    [82 tests, 15.99 s]
----------------------------------------------------------------------
Doctests interrupted: 1/1 files tested
Doctests interrupted: 2/1 files tested
...
Doctests interrupted: 98/1 files tested
Doctests interrupted: 99/1 files tested
----------------------------------------------------------------------
Total time for all tests: 16.1 seconds
    cpu time: 1641.9 seconds
    cumulative wall time: 1643.1 seconds

Apart from this ticket's branch, I have also merged those of tickets #15316 and #16480, as I need those for building.

malb commented 10 years ago
comment:59

Thanks for running those. As far as I can tell, this means for you, too, doctests pass.

malb commented 10 years ago
comment:60

I renamed DiscreteGaussianIntegerSampler to DiscreteGaussianDistributionIntegerSampler. Nils, can you review this commit?

malb commented 10 years ago

Commit: a1b711f

malb commented 10 years ago

Last 10 new commits:

e791536all bern functions should return long only
61359b0C documentation fix
2c11c5edoctest fixes due to changed code
bc732a2fixed silly typo
27af749Merge branch 'develop' of trac.sagemath.org:sage into u/malb/t15915_discrete_gaussians
92e61fdMerge branch 'develop' of trac.sagemath.org:sage into u/malb/t15915_discrete_gaussians
caa8ccfMerge branch 'develop' of trac.sagemath.org:sage into u/malb/t15915_discrete_gaussians
2051027increase allowed bound in testcase and clarified documentation
c9d3c9cDiscreteGaussianIntegerSampler -> DiscreteGaussianDistributionIntegerSampler
a1b711ffixed bug in iter_vectors and wrong doctests which accepted its results
malb commented 10 years ago

Changed branch from caa8ccf to u/malb/t15915_discrete_gaussians

vbraun commented 10 years ago

Changed branch from u/malb/t15915_discrete_gaussians to a1b711f

vbraun commented 10 years ago

Changed commit from a1b711f to none

vbraun commented 10 years ago
comment:64

See #16968 for a follow-up