Unintuitive behaviour of relative precision in distributions/bug?

rharron commented 11 years ago

I feel like my understanding was that distributions were meant to be relative precision (namely, their base ring is capped absolute, but the ordp member variable takes care of the relative precision). But here's some code that bothers me because adding mu and mu2 gives one less relative precision:

sage: p = 691 sage: D = Distributions(0, p, 7, ZpCA(p, 7)) sage: v = [p, p-1] + [0]_5 sage: v2 = [p, 1] + [0]_5 sage: mu = D(v) sage: mu2 = D(v2) sage: mu (691 + O(691^7), 690 + O(691^6), O(691^5), O(691^4), O(691^3), O(691^2), O(691)) sage: mu2 (691 + O(691^7), 1 + O(691^6), O(691^5), O(691^4), O(691^3), O(691^2), O(691)) sage: nu = mu + mu2 sage: nu 691 * (2 + O(691^6), 1 + O(691^5), O(691^4), O(691^3), O(691^2), O(691)) sage: mu.precision_relative() 7 sage: mu2.precision_relative() 7 sage: nu.precision_relative() 6

What freaks me out more is the magic: sage: nu - mu2 (691 + O(691^7), 690 + O(691^6), O(691^5), O(691^4), O(691^3), O(691^2), O(691))

i.e. it found its precision back (because it uses the absolute precision to figure out the the number of moments)

But maybe I'm missing the point.

One thing that worries me more is when it comes to solving the difference equation one introduces denominators. Then, if you add to distributions and the denominators cancel, then you've just completely lost some relative precision.

(And of course, if you create a distribution but don't print it (or normalize it; note many functions create new distributions without normalizing), then precision_relative can give a different answer before and after normalizing.)

rpollack9974 commented 11 years ago

I think the part about the relative precision is okay -- at least this is also how sage treats p-adics. For example,

sage: R = Qp(3,5) sage: a = R(1) sage: b = R(2) sage: a.precision_relative() 5 sage: b.precision_relative() 5 sage: c = a+b sage: c.precision_relative() 4 sage: (c-a).precision_relative() 5

The questions about the difference equation and its denominators are less clear to me...

rharron commented 11 years ago

One issue I've been dealing with is verifying the output of the code. One can "lose" even more precision applying (Delta - 1). Here's an example:

sage: p = 3 sage: D = Distributions(0, p, 7, ZpCA(p, 7)) sage: mu = D.random_element() sage: mu (3 + 3^4 + O(3^7), 2 + 2_3 + 3^2 + 3^3 + 2_3^4 + 3^5 + O(3^6), 2 + 2_3^2 + 2_3^3 + 2_3^4 + O(3^5), 2 + 3 + 3^3 + O(3^4), 2 + 3 + 3^2 + O(3^3), O(3^2), O(3)) sage: mu._moments[0] = 0 sage: mu (O(3^7), 2 + 2_3 + 3^2 + 3^3 + 2_3^4 + 3^5 + O(3^6), 2 + 2_3^2 + 2_3^3 + 2_3^4 + O(3^5), 2 + 3 + 3^3 + O(3^4), 2 + 3 + 3^2 + O(3^3), O(3^2), O(3)) sage: nu = mu.solve_diff_eqn() sage: nu 3^-1 * (2_3 + 2_3^2 + 3^3 + 3^4 + 2_3^5 + 3^6 + O(3^7), 2_3^2 + 3^3 + O(3^6), 2_3 + 2_3^2 + O(3^4), 2_3^2 + O(3^4), 3 + 3^2 + O(3^3), 2 + O(3), O(3)) sage: S0 = nu.parent()._act.actor() sage: Delta_mat = S0([1,1,0,1]) sage: nu * Delta_mat - nu (O(3^6), 2 + 2_3 + 3^2 + 3^3 + 2_3^4 + O(3^5), 2 + 2_3^2 + O(3^3), 2 + 3 + O(3^3), 2 + 3 + O(3^2), O(3^0))

Notice we started with mu having 7 moments. Then, nu here has 7 moments, the 6th one should be O(3^2), but it's only O(3), so really, nu should have 6 moments. Then, nu * Delta_mat - nu actually has 6 moments, but the 3rd and 6th moments both have one pigit(?) of precision to few, so really this should have 5 moments. The problem with nu is just what I was talking about in issue #45. With nu * Delta_mat - nu, what's happening is that you start with nu having say 6 moments, and valuation -1. You act by Delta_mat and subtract nu and there will be cancelation of the non-integral part. This seems to mess things up. This makes checking that the code works properly a bit of a pain (it's even more complicated to check for the family code!). (But I have checked that for a bunch of randomly selected distributions the answers in this code agree (up to appropriate precision) with the answers spit out by your old code).

rpollack9974 commented 11 years ago

OK. Try again now and see if it's any better. One really does lose precision in the solve_diff_eqn because (a) the moments are shifted -- this causes a loss of 1 pigit (b) the bernouilli numbers have denominators and (c) the formulas actually involve the m-th moment over m which causes tons of denominators to potentially appear. So I adjusted the code to drop the tail moments where we don't have enough accuracy.

rpollack9974 commented 11 years ago

Hold on. That wasn't right. I think I sorted this out. I'll upload new code soon.

rharron commented 11 years ago

What I would suggest is to only create v = [K(0) for i in range(M-1)](instead of up to M). This ensures that only the theoretical limit on the precision is considered. Any further loss of precision, I think, is just a figment of how relative precision is dealt with and so will be dealt with naturally in the last part of the function where ans.ordp is addressed. Agree? (One should also begin this function by treating the cases mu[0] != 0, M = 0, 1, and 2.)

rpollack9974 commented 11 years ago

Ugh. It's more complicated. Here's a weird example that illustrates a problem. Take p=3 and nu to be the distribution which is 1 on z^9 and 0 elsewhere. Then if I think of nu in the 10-th approximation module and solve the difference equation I get: (here I'm using the code which just drops the last moment)

sage: D = Distributions(0,3,base=Qp(3,10)) sage: nu = D(9[0]+[1]) sage: nu (O(3^10), O(3^9), O(3^8), O(3^7), O(3^6), O(3^5), O(3^4), O(3^3), O(3^2), 1 + O(3)) sage: mu = nu.solve_diff_eqn() sage: mu 3^-2 \ (O(3^9), O(3^8), O(3^7), O(3^6), O(3^5), O(3^4), O(3^3), O(3^2), 1 + O(3))

OK. Now instead of thinking of nu as in the 10-th approx module, think of it in the 9-th approx module -- there it is 0. Thus, when solving the difference equation we would get 0 in the 8-th approximation module.

Now here's the problem. Let's take mu from above (which is in the 9-th approximation module) and act on it by [4,1,3,1].

sage: mua = mu.act_right([4,1,3,1]) sage: mua 3^-2 * (O(3^9), 2_3^7 + O(3^8), 3^6 + O(3^7), O(3^6), 2_3^4 + O(3^5), 3^3 + O(3^4), O(3^3), 2*3 + O(3^2), 1 + O(3)) sage: mua.diagonal_valuation() 6

The result when projected to the 8-th approximation module isn't zero. In fact, it's not even zero when projected to the 7-th approximation module. It only becomes 0 in the 6-th approximation module.

This means our solution to the trivial difference equation taking 0 in the 9-th approximation module must be wrong. The result is only 0 in the 6-th approximation module! And there is no way the relative precision code could have predicted this!

The upshot: when solving the difference equation in the M-th approximation module, the result needs to be considered in the (M-m-2)-th approximation module when m is the ceiling of log_p(M+1). These constants come from Lemma 7.5 in my paper with Glenn. In this case M=9, m = 2 and so we should be in 9 - 4 = 5. (Probably 6 works in this case as I think the bernouilli numbers don't interfere in this case.)

rharron commented 11 years ago

Right, this is the type of problem I was alluding to above (https://github.com/roed314/OMS/issues/46#issuecomment-14676885): basically, the answer has relative precision 9, but may have negative valuation. When it has negative valuation, doing any arithmetic can cause it to collapse its precision. This is why I was sad about the behaviour of the relative precision.

Let Fil^M denote the Mth level of the filtration of D^0, then we are starting with (what you call) nu in D^0 and getting an answer mu in what I'll try calling p^(-v)D^0/Fil^M, which is the confusion. We think of M as the precision, which absolutely it is, but relatively, there are two precisions, M: the number of moments; and r: the relative p-adic precision. The conflation of the two precisions for elements of valuation 0 is what is confusing. I think you're right and we do need to kill off those m-1 (or so) extra moments. I think the best way may be to start by dropping 1 moment, compute the solution, take its valuation, and based on the valuation kill more moments.

rpollack9974 commented 11 years ago

No. I think these are two separate issues. For instance, if we take the 0 distribution in the 9-th approximation module, and "solve the difference equation" we get the 0 distribution in the 8-th approximation module. Looking at the valuation doesn't tell you anything about that this answer really should have been in the 6-th approximation module. This problem is being caused by the potential moments after the 9-th one.

Here's an example. I'm take nu now to take the value 1 on z and 1 on z^9, 0 on the other monomials, and view it in the 10-th approximation module. Let mu be the solution to the difference equation which currently puts it in the 8-th approximation module. But in fact this should be in the 6-th approximation module (as in the example above). So we project mu there and see that it now has valuation -1. Then if you compute mu * [1,1;0,1] - mu you see that it matches nu only in the 5-th approximation module -- this second loss of accuracy is the relative precision issue that you are writing about.

Here's the code:

sage: nu = D([0,1]+7_[0]+[1]) sage: nu (O(3^10), 1 + O(3^9), O(3^8), O(3^7), O(3^6), O(3^5), O(3^4), O(3^3), O(3^2), 1 + O(3)) sage: mu = nu.solve_diff_eqn() sage: mu 3^-2 * (3^2 + O(3^9), 3^2 + 3^3 + 3^4 + 3^5 + 3^6 + 3^7 + O(3^8), 2_3 + 3^2 + 3^3 + 3^4 + 3^5 + 3^6 + O(3^7), O(3^6), 2_3 + 2_3^2 + O(3^5), O(3^4), 2_3 + O(3^3), O(3^2), 1 + O(3)) sage: mu6 = mu.reduce_precision(7) sage: mu6 3^-1 * (3 + O(3^6), 3 + 3^2 + 3^3 + 3^4 + O(3^5), 2 + 3 + 3^2 + 3^3 + O(3^4), O(3^3), 2 + 2_3 + O(3^2), O(3)) sage: mu6.act_right(Matrix(2,2,[1,1,0,1]))-mu6 (O(3^5), 1 + O(3^4), O(3^3), O(3^2), O(3))

rpollack9974 commented 11 years ago

I should also say that I don't think the relative precision issue is really a problem. Continuing with the last example, let me write mu as p^(-1) * mu' where mu' lives in D/Fil^6. We then have an equation:

p^(-1) mu' | Delta = nu

Where is this equality taking place? Well the RHS naturally lives in D/Fil^10. The LHS lives in p^(-1) D / p^(-1) Fil^6. But since we have equality, the moments on the left are actually integral and so really the LHS lives in D / (p^(-1) Fil^6 cap D) which is just D/Fil^5. And so we should view this equation as taking place in D / Fil^5 which is exactly what the computer is showing!

rharron commented 11 years ago

I see. But shouldn't m be ceiling(log_p(M+1) - valuation(nu)) in view of corollary 7.6), or does that corollary not taking into account the issues you're raising now.

rpollack9974 commented 11 years ago

I don't think so. I think the valuation of nu is kept track of by the whole relative precision thing. Let's say that nu has valuation -1. Then if you write nu = p^(-1) nu' -- solve the difference equation with nu' -- get a corresponding mu with the appropriate accuracy (i.e. M-m-2) and then just stick a p^(-1) in front of this mu. This must be what the code does, yes?

rharron commented 11 years ago

I was wondering if one could save some precision if nu has positive valuation.

rpollack9974 commented 11 years ago

Ugh. I see what you mean now. So Corollary 7.6 I think is true, but not really relevant for what we are doing. We need a statement like this -- if nu is in p^m F(M) then there exists a unique mu in F(M') such that mu | Delta = nu.

No no. That's not right either. The problem is that being divisible by p^m in F(M) is a funny thing -- the last moment is only defined mod p so its weird to say that it is divisible by some higher power of p.

OK -- I'll try to make a better statement.

rpollack9974 commented 11 years ago

OK. Let F(M) denote the M-th approximation module -- that is D^0 / Fil^M (D^0). If I write p^e * F(M) this is kind of ambiguous in the same way that p (Z/p^mZ) is ambiguous. It could simply mean the submodule of size p^(m-1) inside of Z/p^mZ gotten by scaling every element by p. But it could also mean the submodule of size p^m in Z/p^{m+1}Z given by pZ / p^{m+1}Z. The first kind of multiplication is an internal multiplication and the second is an external one. I guess this is the heart of the relative precision issues -- we are always using the external kind of multiplication.

So when I write p^e * F(M) I always mean the external multiplication given by p^e D^0 / p^e Fil^M (D^0) -- and this is fine if e is positive or negative.

With that said, let me make a statement which I think is true:

Claim: Let m be 1 + the ceiling of log_p(M+1). If \nubar is any element of F(M), there there exists a unique \mubar in p^{-m} * F(M-1) such that \mubar | Delta = \nubar. Moreover, if \nu is any lifting of \nubar to D^0 with total measure zero and \mu is the unique distribution such that \mu | Delta = \nu, then the projection of \mu to p^{-m} * F(M-1) equals \mubar.

One comment -- to make sense of the "the projection of \mu to p^{-m} * F(M-1)" I am using Lemma 7.5 and the description of F(M-1) as D^0 + p^{M-1} K_0 / p^{M-1} K_0.

Accepting this as true, let me say what I think it means for our computations. Elements of p^{-m} * F(M-1) have absolute precision equal to p^{M-1-m}. So if we solve the difference equation by just writing down the first M-1 moments of mu, and then normalize, we will get some distribution in p^(-e) * F(N). If e>=0, then N = M-1. But if e is negative (i.e. the moments we wrote down are all divisible by some power of p), then N will be smaller than M-1. Elements in this space have absolute precision N-e. To get the right accuracy, we then need to reduce precision (i.e. drop moments) so that the absolute accuracy is M-1-m.

For example,

1) if e = m, then we are done and we do nothing.

2) if e = m-1 and is non-negative, then N = M-1 and we need to reduce precision by 1.

3) If the solution mu turns out to be the 0 distribution, then e = M-1 and N = 1. We then need to reduce precision by m to get absolute precision M-m-1 -- that is, we get the 0 distribution in F(M-m-1) as we want.

Note -- we probably can do a little better than this, especially for small M. To do so, we would have to improve Lemma 7.5 which I think is possible. Lemma 7.5 takes the absolute worst case -- but if M is far from a power of p you can probably improve this, and you can probably improve things in particular examples by looking at Bernoulli numbers and seeing that no bad denominators are there.

rpollack9974 commented 11 years ago

OK...one improvement to Lemma 7.5 comes from actually reading and applying the statement (I failed to do that.). In the above comment, the integer m can be taken to be 1+ the floor of log_p(M).

One can still do a little better thinking about bernoulli numbers -- that would get rid of the +1 from above.

Hmm...ok, I sat and thought about bernoulli numbers and I think you can always get rid of that +1. I now think m can just be taken as the floor of logp(M) -- recall we are projecting to F(M-1). We should write down a careful proof of this -- maybe in the families paper? -- but the idea is that in Lemma 5.7 we are dealing with terms of the form (r choose j) B{r-j} / j+1 -- these give p's in the denominator when j is of the form p^e u - 1. But for such j, the binomial coefficient tends to be divisible by p thus canceling about denominator in the bernoulli #. If the binomial coefficients is a p-adic unit, then hopefully the bernoulli # won't have any p's in its denominator.

roed314 / OMS

Unintuitive behaviour of relative precision in distributions/bug? #46