Conditional dbn bounds - Githubissues

km-git-acc commented 6 years ago

@rudolph-git-acc I have added the sawtooth and approximate bounds to the repo here https://github.com/km-git-acc/dbn_upper_bound/blob/master/dbn_upper_bound/pari/abbeff_largex_bounds.txt

Also, the full mollifier has been changed back to mollifier, and the mollifier with prime terms has been changed to mollifier_prime There are also multiple functions now for the approximate bounds ep_tbound and ep_lbound which are exact and use the full mollifier ep_tbound_mollp and ep_lbound_mollp which are exact and use the prime mollifier ep_tbound_approx and ep_lbound_approx which currently use the prime mollifier but can use the full one too

Both the mollifier functions needed the t parameter too which I have added. It worked earlier if t was assigned a value at the beginning itself but could have been prone to errors.

Added the ball script for the sawtooth bounds to the repo here https://github.com/km-git-acc/dbn_upper_bound/blob/master/dbn_upper_bound/arb/Mollifier.3.and.5.sawtooth.bound.script.txt You may want to update it with a generalized mollifier.

Also, added an approx function for (A+B)/B0 using N0 terms in the barrier multieval script. I have tried it at x=10^100 and it works (though the precision \p may have to be increased first to around log(x)). Ofcourse when y and t are small the quality of the approximation is not clear. The approximation gets better the larger the B exponents and should be good enough at real(exponents) >> 1 (similar to how the zeta function behaves). Ofcourse when the real(exponents) >> 1, (A+B)/B0 moves close to 1, so the area of interest is small t and y where the approximation may not be that great.

km-git-acc commented 6 years ago

@rudolph-git-acc I have started running the bound calculations from N=892063 to 30 mil, y=0.2, t=0.16, mollifier 5. It's running through 35000 N values in a minute, so I guess will take around 13 hours.

Have also added a function barrier_strip_search to the barrier_multieval script which can be used to generate euler product data near a chosen X (which then has to be filtered and searched for good X candidates, maybe in a spreadsheet or an adhoc script).

EDIT1: Since it is appending output to an already large file, the speed has reduced to 25000 N vals a minute, and probably will keep decreasing gradually. Even though the bounds allow us to complete things in a single run, it may be a good idea practically to have multiple 'tooths' instead of a single one for large runs.

rudolph-git-acc commented 6 years ago

Great! I will start the same batch in Sage Balls when I am back online tonight, so we can compare output.

As soon as we have found a good spot for the barrier, happy to run the barrier clearing as well.

P.S. I was checking the general Wiki page on the De Bruijn Newman constant and was surprised to see that the page was immediately updated after we announced the Lambda =0.22 result. Somebody is closely monitoring our progress :-)

Verstuurd vanaf mijn iPad

Op 6 mei 2018 om 08:30 heeft KM notifications@github.com het volgende geschreven:

@rudolph-git-acc I have started running the bound calculations from N=892063 to 30 mil, y=0.2, t=0.16, mollifier 5. It's running through 35000 N values in a minute, so I guess will take around 13 hours.

Have also added a function barrier_strip_search to the barrier_multieval script which can be used to generate euler product data near a chosen X (which then has to be filtered and searched for good X candidates, maybe in a spreadsheet or an adhoc script).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

km-git-acc commented 6 years ago

@rudolph-git-acc Great. I think the results are being updated by Prof. Tao himself.

The below analysis can be treated as an exercise in overcaution, but it could be useful as a backup.

Our formula for bound_(N+1) is bound_N - (Extra D terms at N+1) (- also a small error term; ignoring that for now)

As N increases, modgamma decreases so we are nicely conservative with the 1-a1 = 1-modgamma term

The exponent sig also increases as N increases, so we are conservative here as well.

The common term in the lemma bound sum (which we want as large as possible to be conservative) |(1-a1)/(1+a1)| = |(1-gamma)/(1+gamma)| is actually slightly different from (1-modgamma)/(1+modgamma). |1-gamma| >= 1-modgamma and |1+gamma| <= 1+modgamma, so the actual term is slightly larger than what we are using (also the modgamma we are using is actually a upper bound on gamma making our current sum even smaller). But it's not clear whether we can take gamma as gamma(xN) since it may or may not not be monotone decreasing in the same N interval.

Secondly, (1-modgamma)/(1+modgamma) increases as N increases, so we are not conservative if we use a smaller common term in the prevbound calculations while calculating the bound at N+1 (since then max((b-a), c*(b+a)) would be often if not always smaller than what would have happened if the N+1 bound was calculated normally)

The maximum possible value of (1-modgamma)/(1+modgamma) should be 1, and more strictly that of |1-gamma|/|1+gamma| should be (1+modgamma)/(1-modgamma). The latter actually decreases towards 1 as N increases which is its ideal behavior. So while calculating both the base bound and then the sawtooth bounds, we should take the common term to be either 1 or more conservatively (1+modgamma)/(1-modgamma).

Since the current full bound is used for normal calculations as well, its better to create another version of the full bound which would be used only with the sawtooth bounds function.

Ofcourse, currently some of the quantities work in our favor and the common term does not, so (non-rigorously) their effect likely cancels out and even the current bounds may be fine, but I will write two additional functions (one each for the base and sawtooth bounds) which can calculate according to the above.

EDIT: Actually on thinking further it seems that modgamma enters the derivation early at bound >= |B sum| - modgamma*|A sum| so we may not worry about |1-gamma|/|1+gamma| or that modgamma is an upper bound for gamma at a given N. So we could take commonterm=1 (which gives a comfortably positive base bound (0.0741104187469) for N=892062 Euler 5 but takes things negative even for N=69098 Euler 7), or more optimally commonterm = (1-modgamma_(Nend))/(1+modgamma(N_end)) which is the highest value of commonterm in the chosen N verification range.

With the more optimal latter setting, N=69098,y=0.2,t=0.2, mtype=7 gives a base bound of 0.0338943569372946644 N=892062,y=0.2,t=0.16, mtype=5 gives a base bound around 0.11

Also I have updated the script https://github.com/km-git-acc/dbn_upper_bound/blob/master/dbn_upper_bound/pari/abbeff_largex_bounds.txt with 4 functions

abbeff_largex_ep_lbound_sawtooth_loop (similar as earlier but eliminates the repeated computation of the mollifier and some constants)
lembound_conservative (which uses the conservative common term to calculate the lemmabound)
abbeff_largex_ep_lbound_conservative (which uses lembound_conservative with appropriate parameters)
abbeff_largex_ep_lbound_sawtooth_conservative_loop (similar to abbeff_largex_ep_lbound_sawtooth_loop but with the common term updated)

rudolph-git-acc commented 6 years ago

@km-git-acc

Thanks for the additions and good that you went through the sawtooth method once more to verify we hadn't missed a 'progressive' variable when jumping from N to N+1.

Just kicked off the run for: N = 892062 to 30000000, step 1, t=0.16, y=0.2 euler5 (although we probably could have tried euler3 as well). Will spend tomorrow on trying to update P15's ARB script to accommodate y < 1/3. He has already developed a generalised mollifier formula, which is something I unfortunately can't build in Sage Balls (no arrays, lists, vectors, etc. supported for Sage ARB's or Sage ACB's yet).

km-git-acc commented 6 years ago

@rudolph-git-acc Great. I have reached around 27 mil (with the non-conservative version) and the bound value looks like it will stay above 0.1772. So hardly any change from 0.178!

Incidentally, the output file currently is around 3.6 gb (although compression should reduce it quite a bit). One thing we can do for large runs is to print the output for only every 100th or 1000th or some 10^kth N value. Since the formula is recursive and relies on the previous bound, it implies the bounds for the non-printed N lie in between the printed bounds. Another redundancy is the t and y being printed on every line, when they could just go into the filename. Will make these changes later.

rudolph-git-acc commented 6 years ago

@km-git-acc

Yeah, we could indeed simplify the output a bit further. To my surprise my run just completed already... ...and I discovered that I had run till 3 mln instead to 30 mln :-) Running again!

km-git-acc commented 6 years ago

@rudolph-git-acc

Some more analysis of our approach while verifying the lemmabounds between [Nmin,Nmax]:

|bn-an| = |1-modgamma*n^y||f(n,mollifier)| and |bn+an| = |1+modgamma*n^y||f(n,mollifier)|

For Nmin = 69098, Nmax = 1.5 mil, y = 0.2 c = 0.89 (approx) hence approx n >= N/(1.5 mil) and n < N(1.5 mil) or conservatively, n >= Nmax/(1.5 mil) and n < Nmin(1.5 mil) i.e. n >= 1 and n < 10 billion (approx) This should be valid for upto a mollifier with D < (10 bil)/(1.5 mil) = 6666 or Euler mollifier 11

So if use the optimal setting for commonterm, and if any of our bounds are positive enough till mollifier 11 (and I saw yesterday that with Euler 7, we were getting a conservative base bound of around 0.03389), we should be good.

And when commonterm=1 itself gives a positive bound, we can use that since then we are not restricted in either our mollifier usage or our choice of Nmax.

rudolph-git-acc commented 6 years ago

@km-git-acc

Your reassuring analysis looks sound to me.

In the mean time my run up to 30mln has completed as well:

https://drive.google.com/open?id=1ILU67z84X8akrS4MfFSOrLQLlAQ1Wp-1

Have spent most of the day on changing the ARB code. Have completed most of the coding and the script compiles without any errors. Currently in test phase but struggling with an unexpected 'segmentation error: 11' issue, that seems to have to do something with the bt and at arrays. Will plough forward step by step :-)

km-git-acc commented 6 years ago

@rudolph-git-acc Great. I am currently working on the tex derivation for the writeup.

rudolph-git-acc commented 6 years ago

@km-git-acc

I now have a fully working ARB-version of the Lemma bound (so, valid for all y) :-)

Need a to make a small final tweak (related to the max-function that is not supported in acb-poly language). It is super fast and supports generic choice of the Euler mollifiers (thanks to P15, whose module I could fully reuse).

Will share the script tomorrow.

rudolph-git-acc commented 6 years ago

@km-git-acc

Here is the ARB-script for the Lemma bound (fully written in acb-poly functions). It was a bit of a learning curve, but I feel now quite comfortable coding other functions as well.

https://github.com/km-git-acc/dbn_upper_bound/blob/master/dbn_upper_bound/arb/main.c

Grateful if you could try it out. Is there anything you would recommend me to code next?

P.S.: t=0.19, y=0.2 at 69098 with Euler 11 is indeed negative: -0.08732291084 (it ran in 1.5 hours).

km-git-acc commented 6 years ago

@rudolph-git-acc Awesome. I will try it out. Well, one thing we had discussed in the past is that although the barrier calculations using the Taylor expansion method is quite fast, it may work even better in Arb. That is something we could try out and would also make a meaningful difference at x=10^13.

rudolph-git-acc commented 6 years ago

@km-git-acc

Will take a look at the Barrier script.

I now also could compute the bound for a few higher 'conditional' N:

   N        xN       moll       t       y      bound
282094   (x=10^12), euler 3, t=0.16, y=0.2, -0.4786244069
282094   (x=10^12), euler 3, t=0.17, y=0.2, -0.1228575095
282094   (x=10^12), euler 3, t=0.18, y=0.2,  0.1326973081
892962   (x=10^13), euler 3, t=0.15, y=0.2, -0.2543316258
892962   (x=10^13), euler 3, t=0.16, y=0.2,  0.08506987434
2820947  (x=10^14), euler 3, t=0.14, y=0.2, -0.1053221945
2820947  (x=10^14), euler 3, t=0.15, y=0.2,  0.227463558
8920621  (x=10^15), euler 3, t=0.13, y=0.2, -0.02371181138
8920621  (x=10^15), euler 3, t=0.14, y=0.2,  0.3166296087
28209479 (x=10^16), euler 3, t=0.12, y=0.2, -0.009041863662
28209479 (x=10^16), euler 3, t=0.13, y=0.2,  0.3611995469

It seems that with every extra power of 10, the bound becomes positive for a t-0.01. What surprises me is that this process seems to accelerate (i.e. the bound becoming positive sooner). Do you seen the same pattern? Now running 10^17.

km-git-acc commented 6 years ago

@rudolph-git-acc Well increasing N should have a positive effect on the bound, but I dont't have a firm opinion on the speed of that effect yet and will have to do some such exercises myself to understand better.

Have been busy with some personal stuff so haven't been able to give enough time to the project this week, and decided to first complete the pending writeup part with whatever time I had. I should be able to get back to normal mode during the weekend and after.

I have edited the writeup (pg 35 and 36) with the incremental approach (conservative). Two things I realized during derivation is that |bn-an|/|bn+an| probably cant be written simply as |1-modgamma*n^y|/|1+modgamma*n^y| due to the additional d^y factors in the mollification of an, but then have to written as |g(n,moll)-modgamma*n^y|/|g(n,moll)+modgamma*n^y| which I then computationally verified meets the desired properties. Secondly, when N to N+1, it is not just the end D terms which have to be accounted for. For each divisor d of D, the numerators of the terms n=dN+[1,d] should also change a bit, due to the n <= dN part.

Due to the multiple minor changes in this approach, I feel it's better to first get feedback and close any remaining possible loopholes, before making definitive changes in our bound scripts.

rudolph-git-acc commented 6 years ago

@km-git-acc

Great. Managing the project/life balance is a delicate art and needs to be watched carefully :-)

Agree with approach. Beter to move slower, yet surer.

I am progressing well on the Barrier script in ARB-language. Have coded the (A+B)/B0 effective formula (but not yet with 'multileval' or with 'Taylor terms') as well as the ddx and the ddt-bounds. Now coming to the stage to build the x-y rectangle 'walk' and the t-direction with the 'arg' to compute the winding number. I plan to use a single parameter X as input and then take X-1/2 and X+1/2 as the location of the two barrier 'plates' (and will give an error when these lower and upper x just happen to lie on a jump N to N+1). This method apparently should give a 'symmetry' benefit, but not sure how this manifests itself in optimised code. Grateful for a short explanation how this symmetry could help.

km-git-acc commented 6 years ago

@rudolph-git-acc I think the symmetry with respect to X and/or y helps in keeping 'x' smaller in the Taylor expansion of exp('x'), hence for the same accuracy levels, a smaller number of Taylor terms is enough. The way it should probably work in a script is if one is targeting a certain error magnitude, since the overall computational effort is smaller than with a non-symmetric approach, the symmetric approach should run faster.

km-git-acc commented 6 years ago

@rudolph-git-acc I was able to run the Arb script for computing lemma bounds (main.c). Have renamed it to mollified_lemma_bounds.c

The output matched that of the parigp script. Speed is quite good at around 2 seconds per N (tested it for the batch 69098 to 69197), hence about a 30x gain compared to the parigp script (running for a single N) (which was already quite vectorized).

rudolph-git-acc commented 6 years ago

@km-git-acc

Great! I am almost done with the ARBarrier script and will share as soon as it has been fully tested.

Each night I run a very high value of the Lemma bound. Got a memory error at 10^17 for Euler 3 (purely because of the at/bt-arrays) and therefore continued with Euler 2 that ran well. These are the results: 89206205 (x=10^17), euler 2, t=0.11, y=0.2, -0.4031854932 89206205 (x=10^17), euler 2, t=0.12, y=0.2, 0.1008420387 89206205 (x=10^17), euler 2, t=0.13, y=0.2, 0.3732703597

First run of 10^18 tonight... :)

Also, on the symmetry benefit for the Taylor terms fo the exp-function: it appears that the acb-poly-exp function is already implemented with Taylor terms that are used up to the point where the desired precision is reached. Hence my software is not protesting on the exp-function at 10^17 (like pari/gp would).

It took me a couple of days to get to grips with the arb, arb-poly, act and act-poly (the latter I use most), but I am really impressed by its power. It feels a bit like the good old days of 'assembler' language. There are a few tricks to learn though and it definitely is not as intuitive as f.i. pari/gp. However, this is all compensated by the incredible speed you get out of it. When I have the Barrier script working, I will capture some of my key learnings.

km-git-acc commented 6 years ago

@rudolph-git-acc

Interesting that numbers like 10^17 are not as intimidating as they once seemed to be. The assembler analogy is apt (I have read that matrix multiplication which is the bulk operation for ANNs is still being written in assembly for the maximum gain!).

I was able to run the script for N=282094, but had to stop midway for 2820947. On the behavior of the bounds as we increase N, a rough rule of thumb could be that for a fixed y (eg. y=0.2) if we keep t*log(N) constant, we should get the same or a slightly more positive bound (more positive since |gamma| decreases as well). So if N is increased by 10X, we should be able to decrease t by 1+log(10)/log(N) times. This suggests the percentage gain in t should start getting smaller after a certain point (till now the percentage gain has been increasing) as any positive effects from |gamma| and/or mollifier usage are exhausted. To keep the bounds positive with the current formulas, N anyways tends to infinity as t tends to zero. The KKL paper had indicated there should be a x, hence N, beyond which all zeroes are real irrespective of y (by showing that for any t > 0 there can only be a finite number of complex zeroes), although how to put a definite number on it is probably a hard theory problem right now.

km-git-acc commented 6 years ago

@rudolph-git-acc I am using the lemmabound arb script on multiple machines in parallel (currently manually). So for example I was able to calculate the euler 5 bounds for all N between 69098 and 80000 in about 7-8 hours on one of the machines, and am halfway through for euler 3 bounds between 80k and 150k on another machine. The main bottleneck I see is the long verification between 150k and 1.5 mil which despite the higher relative speed of euler 2 would still take a lot of time. That has had me wondering. Is there a readymade framework for grid computing where one can embed these scripts and allow anyone with a computer to download and run it (similar to Folding@home). People will not know the details of what they are running but just that they are providing computational power to something interesting.

rudolph-git-acc commented 6 years ago

@km-git-acc

Yes, massive parallel processing is a great idea and our problem seems ideal for it since it could be easily split into ranges. I believe also the GIMPS (Great Internet Mersenne Prime) project uses massive parallel computing from personal computers that are otherwise idle anyway. There should be crowd-grip computing software available, but haven't searched for this yet.

Two years ago I considered building my own super computer based on 32 (or more) Raspberry Pi's. I saw a nice youtube video on that (if I recall correctly it was built by a university professor from the UK and his 6 year old son had constructed the Lego-rack with all the modules in it). It produced 1 teraflops (!) for 32 parallel processes and consumed 15 Amps (so, just below blowing a household fuse).

I tried x=10^18 (N=282094791) with both euler 2 and no-mollifiers, but both ended up in a kill 9 error given by my Mac OS. It apparently consumes too much of the available resources, but I believe with better hardware this constraint could be removed. So far, there are at least no exp-functions that start to protest.

Made some further progress today on the barrier script and now ran it successfully in the 10^10 domain. It is clear that the bottleneck is the computation of abbeff at these heights (the ddx and ddtbound both remain very fast). So, I think I do need to consider also building the abbeff multi-eval mechanism into ARB when we want to tackle the 10^13 and higher ranges. Also noted that the precision setting makes quite a difference for the computation speed (e.g. a 10 digit vs a 20 digit evaluation of abbeff).

km-git-acc commented 6 years ago

@rudolph-git-acc Great. I too will search for grid computing tools and see what I can find. Raspberry Pis are something cool which I have to yet try out. I guess at some point exact methods have to yield and approx methods with suboptimal bounds come into play. Here this wiki is very useful http://michaelnielsen.org/polymath1/index.php?title=Estimating_a_sum Using this, one could get atleast some bound for even something like N=10^100 using just the first N0 terms (say 10^5). This works only for non mollified sums but I remember in the eighth thread, we were also trying something similar for mollified sums. By the way, if the intent is to see how far up we can go with an exact calculation, a powerful machine on a cloud could help (I checked yesterday Google cloud offers $300 in credits, so that is an option). Yeah, abbeff exact at these heights can be painful to compute. Also, since abbeff is expected to be of the order of 1, hence not close to 0, we may be fine even with 10 digits precision.

rudolph-git-acc commented 6 years ago

@km-git-acc

Today I had an interesting learning curve on using the Ball Arithmetic. I did encounter this problem:

In certain situations, however, overestimation leads to exponential slowdown or even failure of an algorithm to converge. For example, root-finding algorithms that refine the result iteratively may fail to converge in ball arithmetic, even if they do converge in plain floating-point arithmetic. Therefore, ball arithmetic is not a silver bullet: there will always be situations where some amount of numerical or mathematical analysis is required. Some experimentation may be required to find whether (and how) it can be used effectively for a given problem.

It occurred in all the 4 routines that take a 'walk along a side of rectangle". Or in pari/gp for instance:

x=x2; y=y1
while(x>=x1
abb=abbeff(x,y,t)
row_out = row_output(x,y,t)
wind_num = wind_num + arg(abb/abbprev)
x=x-(abs(abb)-c)/Dxabb
abbprev=abb

What happened in ARB is that abs(abb) slightly increases the error term (radius) of the abb-ball. However, this value is used to decrement x and x is then used as a parameter for abbeff. This induced a loop with an exponential degradation of the error term of the x-ball and ended up in a NaN (Not a Number). I managed to simply 'kill' the error term back to zero (it started in the 10^-20 domain anyway) to prevent the error term of the ball from exploding. A great learning!

Attached is a first working version of the Barrier script. It should be self-explanatory. Inputs are X (barrier is automatically placed at X and X+1), ya...yb, ta...tb, c (I chose 0.05 in my tests). I have fixed the precision at 12 digits by simply calibrating the abbeff results up to x=10^14 (plus a safety margin). It should also be possible to dynamically establish the precision for a certain number of digits output, but this will take quite some extra time at the higher x. During the 'wind' it prints: x, y, t and abbeff for all mesh steps. I have not yet built in an additional halting test when abs(abbeff) goes below c, but that is easy to add.

When you make ya=yb and ta=tb the script doesn't 'wind' but outputs: N, abbeff, ddxbound, ddtbound for X, ya, ta. This is an easy way to start testing some values.

https://github.com/km-git-acc/dbn_upper_bound/blob/master/dbn_upper_bound/arb/main.c

(feel free to rename it, I have still to figure out how to properly upload into GIT).

km-git-acc commented 6 years ago

@rudolph-git-acc I have renamed the file to barrier_calc_arb.c. As an initial test I tried ./a.out 200000 0.4 1 0.0 0.4 0.05 and ./a.out 200000000000 0.4 0.4 0.4 0.4 0.05 The winding number came out fine near 10^-30 with the first run and gave this output for the second

N = : 126156 (A+B)/B0 eff Re: 0.891005907488 (A+B)/B0 eff Im: -0.0497324088123 ddxbound : 0.283453951694 ddtbound : 1.52695041219 It runs quite fast.

One thing I noticed in the first run is that sometimes y moved beyond 1.0 or x moved beyond the rectangle (for eg. the 15th line in the output of the first run 200000, 1.02850266801, 0, 1.94954145328 or the last line 199999.806497, 0.4, 0.398521051739, 1.71310014215). It seems when the rectangle box is is crossed, it's better to jump to the starting point of the adjacent side. Also, the first run printed a winding number at the end. It seems there should be a winding number computed for each t in the t mesh.

rudolph-git-acc commented 6 years ago

@km-git-acc

Thanks for the feedback. These are easy changes to accommodate. I also fixed the winding number not being displayed as zero (didn't like the e-30 output). I also noted that the number of digits for printing x at e.g. 10^10 is too small (it looks like it doesn't run, whilst it does). I will start up the following run tonight: ./Barrier 60000083952 0.2 1 0 0.2 0.05. First rectangle at ta=0ran in 7 minutes, however a rectangle at ta=0.15 in only 15 seconds. Let's see if it completes (i.e. still with exact abbeff computations).

Also have done some tests with negative y and t (now blocked in the script since I am not sure whether the ddx and ddtbound work for negative value) and checked if it finds zeros and It is amazing to see how the argument principle yields exactly 2*pi when following the path around the rectangle and finding a zero :-)

The key bottleneck is clearly in the speed of evaluating abbeff. Today I tried to better understand the multi-eval approach, however got stuck it trying to reverse engineer your latest pari/gp code. I would therefore be grateful for some steers.

Rereading the discussion on the blog, it started off the idea to store subsums h=1 to H that are only dependent on X and N0 (i.e. the new step size to use to jump from 1 to N). Then for the rectangle mesh I only have to use the parameters theta, y, t (with some extra code) and 'calling' the stored sums. I also understand that the Taylor expansion for log(1+h/no) could be done up to a chosen number of orders to achieve the desired accuracy. But then you mentioned setting H=N and basically replacing all the subsums by a single stored sum that is dependent on X and N only. The potential loss of accuracy is then compensated by a relatively high number of Taylor terms. This gave an enormous speed boost. There are also some symmetry optimisations that could be done on top of this. Is my understanding correct so far?

If so, could you maybe share the precise sum (in pari/gp language, but without the vectors/matrices) that needs to be stored (i.e. the one depending on N, X) and how this sum is wrapped up in theta, y, t dependent code to be able to calculate the abbeff-proxy ? My first aim is to write ARB code that computes abbeff-approx in this fast way (no rectangle meshing, etc.). Thanks!

rudolph-git-acc commented 6 years ago

@km-git-acc

In the mean time, I isolated the Abbeff script into a separate program. It outputs the complex and absolute values for (A+B)/B0 (eff). It might be useful for verification with the approximate versions or for finding good candidate locations for the Barrier.

https://github.com/km-git-acc/dbn_upper_bound/blob/master/dbn_upper_bound/arb/abbeff.c

Here is the output for (added absolute value later):

x=10^18, y=0.3, t=0.2 at 20 digits.

N = : 282094791

(A+B)/B0 eff Re: 0.85323352190157886397
(A+B)/B0 eff Im: 0.081805921226414355316

That took approx 15 minutes to compute.

Maybe if you have time, you could run a quick check on this script. Do you think it would be useful to add the C-eff component as an extra switch?

km-git-acc commented 6 years ago

@rudolph-git-acc I was able to run abbeff.c and matched the output with that of parigp. x=10^13 in 4 sec x=10^14 in 14 sec x=10^15 in 44 sec x=10^16 in 139 sec The timings seem to be increasing by 3x for each order of magnitude change. Adding the C component (maybe as a separate function) will give a much better approximation to H_t, although at these heights even abbeff is pretty good (as we have seen through the error bounds).

The latest function for abbeff multieval is _abbeff_multieval_symmetricrectangle (the last function) in https://github.com/km-git-acc/dbn_upper_bound/blob/master/dbn_upper_bound/pari/abbeff_multieval.txt It is symmetric in both X and y and the stored sums are calculated using n0-H/2+1 to n0+H/2, so symmetric for the stored sums too.

The Taylor expansion of log(1+h/no) is actually not performed since the log operation is quite fast and this too is used in stored form. Instead the Taylor expansion of the the large exp term is used. _abbeff_multieval_symmetricrectangle calls abbapproxarr and passes to it the stored sums for this step. This step is also described in this comment https://terrytao.wordpress.com/2018/04/17/polymath15-eighth-thread-going-below-0-28/#comment-497034

When H is set to N and npre to H/2

npre as H/2 indicates the initial few summands will not be exactly calculated since npre-H/2 becomes 0 in bsumarr1 and asumarr1, and even those will be dealt with using the stored sums and Taylor expansions,
similarly bsumarr3 and asumarr3, which was being used for the last few summands (tail sum) also don't come into play, since N is a multiple of H and there is no separate tail.
there is only one n0 which gets set at N/2. The corresponding stored sum is calculated using n=1 to N with the central term at n0. The Taylor expansions are assigned to the bsumarr2 and asumarr2 variables.

rudolph-git-acc commented 6 years ago

@km-git-acc

My first ever real calculation in the x=10^20 domain :-)

./Abbeff 100000000000000000000 0.3 0.2 30

N = : 2820947917

(A+B)/B0 eff Re: 1.20222897589660903770704377757
(A+B)/B0 eff Im: -0.114479185373513195194943297306
|(A+B)/B0|     : 1.20766717036159948436560667537

Experimented with parts of the multi-eval script. Noted that theabbeff_xconst(X,y,t,num=5) and abbeff_yconst(X,y,t,num=5) functions don't really deliver a major speed gain. I tried for instance:

X=6*10^10+2099;y=0.3;t=0.0;

abb = abbeff_xconst(X,y,t,num=10);for(i=1,10,print(abb[i]));

for(i=1,10,print(abbeff(X,y+(1-y)*(i-1)/9,t)));

and both resulted in roughly the same total processing time where I would have expected the first one to be equally slow for the first calculation and then beating abbeff for the next 9. Is this indeed correct?

Also tried

X=6*10^10+2099;y=0.3;t=0.0;

abb = abbeff_multieval_general_xconst(X,y,t,num=10, H=100);for(i=1,10,print(abb[i]));

and this gives the expected speed boost. When I for instance increase H to 100, the speed increases even further (a factor 10 in total), however accuracy then goes down. To compensate for this I changed the expterms=10, and this indeed yields accurate outcomes again. Very nice algorithm!

Then worked line by line through the abbeff_multieval_general_xconst script and think I now understand most of what's happening. Will first try some experiments with ARB-matrices of complex numbers (a matrix with zero columns or rows is also allowed so that could replace a vector).

km-git-acc commented 6 years ago

@rudolph-git-acc The xconst and yconst functions calculate abbeff exactly (hence should take time to compute) for many z points. They were used for verification of the approximate output.

10^20 is quite up there, mindblowing.

Also attaching the output of the lemmabound script for N=69098 to 260k, divided between euler 5, euler3 and euler2 mollifiers. lemmabound_exact_N_69098_260000_y_0.2_t_0.2.txt

Also, attaching the latest version of the sawtooth bound script according to the recent discussion on the blog. I think this should be now close to the potential final version. sawtooth_bounds_v8.txt

rudolph-git-acc commented 6 years ago

@km-git-acc

Great work! I will include the 'sawtooth' element in the ARB-lemma bound script as soon as I am finished with the abbeff_multieval_general_x/yconst functions. I have the x-version nowworking in ARB and will share later today (need to finish the parameter inputs etc). Runs pretty fast.

We are getting closer to realising this exciting formula:

ARB(Multieval-Barrier) + ARB(Sawtooth-Lemmabound) = 2 * tools to attack > 10^10 domain

(^^^ sorry, did a bit too much coding today :-) )

rudolph-git-acc commented 6 years ago

@km-git-acc

Here is the test version forARB abbeff_multieval_general_xconst (adding the yconst is very easy now).

time ./Multieval 60000002099 0.3 0 5 10 4 1000 20

means: x, y, t, num, H, exp terms, npre, digits.

https://github.com/km-git-acc/dbn_upper_bound/blob/master/dbn_upper_bound/pari/Testversiongeneralxconst.c

I accidentally uploaded the script in the paricode folder on GIT, but since this is a test version it will be replaced by the symmetric version soon anyway.

I have used a lot of matrices. There is however one nice finite exp-Taylor expansion function for matrices (listed in the ARB user manual), that I feel we definitely could apply, but haven't figured out yet how to do it.

arb_mat_exp_taylor_sum(arb_mat_t S, const arb_mat_t A, slong N, slong prec) Sets S to the truncated exponential Taylor series 𝑆 = ∑︀𝑁−1 𝐴𝑘/𝑘!.

Let me know how the script works at your end.

km-git-acc commented 6 years ago

@rudolph-git-acc Well prose formulas can be entertaining :) I tried out the xconst script and it is giving accurate results (compared the output with abbeff.c). What is surprising is that the results are exact even with just 1 expterm even with high x like 10^12.

One simplification (maybe as part of a specialized version of the multieval script) could be where H and npre are not allowed as inputs and automatically set to N and N/2 (also I just realized these values will work only for even N. For odd N, H and npre will have to be set to N-1 and (N-1)/2 and the last term in abbeff will have to be calculated separately). EDIT: on thinking further, maybe a separate treatment for odd N is not necessary, just that npre will be fractional.

I ran the euler 5 bounds from N=80k to 1.5 mil,y=0.2,t=0.2 according to the v8 script and got these results. Also attaching an image which shows the sawtooth pattern with the tooths getting higher and longer as N increases. incremental_lemmabounds_sawtooth_pattern_N_80k_1500k_y_0.2_t_0.2_euler5moll.txt

incremental lemmabounds_n_80k_1500k_y_0 2_t_0 2_euler5moll

rudolph-git-acc commented 6 years ago

@km-git-acc

Wow! That's a nice sawtooth. I recall that in our previous version the bound didn't drop as much and never reached a 'recalc' threshold. I guess the last version uses a more careful approach and it seems that for higher N the recalc happens increasingly less often :-)

I would like to put my 'teeth' into the Barrier software again. Which would be the best function to code for that? Is it the abbeff_multieval_symmetric_rectangle that uses abbapproxarr? Or is it the abbeff_multieval_xsymmetric_rectangle script? Also, fixing the setting of H to N and npre to N/2 does give the fastest result right? (with only the expterms left as the optimisation parameter). Grateful for some steers, since I have the coding-momentum now!

km-git-acc commented 6 years ago

@rudolph-git-acc Logically the only difference between _abbeff_multieval_xsymmetricrectangle and _abbeff_multieval_symmetricrectangle should be the anchor points used, as in this picture below.

barrier_anchor_points

The second one should ideally lead to more consistent accuracy around the rectangle, since it wont be the case that some (theta,z) values (with the rectangle points being (X+theta,y0+z)) are larger and some smaller thus potentially require different number of expterms in the Taylor expansion.

The abbapproxarr function was created because I realized that I was just changing the sarr, bexpo, aexpo arrays for each side and then running the same algo four times. So either without modification or maybe some modification, abbapproxarr should be equally usable with _abbeff_multieval_xsymmetricrectangle.

Also, while going along the rectangle, it may be a good idea to print the perimeter coordinates as well and not just the abbeff values. While dealing with just one side as in xconst and yconst that wasn't so necessary.

As H is increased (max possible is N), the results should get faster and faster with the overhead being larger. Since instead of O(NM) operations, we would be doing O(NM/H) operations (and with H=N just O(M) operations), with different implied constants in the O notation. The implied constants for higher H should be generally higher due to the potentially larger time required to calculate the stored sums and/or more expterms. For eg. with H=10, I was getting a speedup of around 7x. With H=50, it was around 20X, with H=N at N=69098, I got a speedup of above 1000X.

The lemmabounds indeed reach the threshold a few times since we are subtracting d_1 + d_2 + ... + D terms from the previous bound instead of just D terms, where d is the divisor set of D. This can be mitigated somewhat with a more complex setting, for eg. for the above sum(d) terms, adding back the nth term with N=prev_N and then subtracting the nth term for N=curr_N, but I haven't tried it out yet. While dealing with higher N (for the conditional dbn values), this behavior is anyways expected to happen much less often.

rudolph-git-acc commented 6 years ago

@km-git-acc

Thanks for a very clear answer. I now much better understand what is going on.

Have worked in detail through your latest Barrier script and can see how it could be done in ARB. Unfortunately I do not have the nice concat-function for matrices in ARB, but think I can quite easily program around that.

Regarding speed, I noted that in my current script the tradeoff between increasing H and the expterms is unfavourable. Maybe this has to do with my use of the gamma-function for the factorial in the Taylor expansion. Will try to also 'vectorise' these factors first and see what happens.

km-git-acc commented 6 years ago

@rudolph-git-acc Yeah, I had noticed in parigp there is both a gamma function and a factorial function with the latter only for integers and works much faster (maybe the smaller factorials are also precalculated). Creating a vector of expterms length is also a good idea.

rudolph-git-acc commented 6 years ago

@km-git-acc

Vectorising the factorial didn't help much, but then I saw that in your pari/gp abbeff_multieval_general_xconst script the same phenomenon occurs. I guess this improves when we have a fully symmetrical rectangle, hence I started to code the new Barrier.

Now have a working version of abbeff_multieval_symmetric_rectangle (hence also abbapproxarr) and will finish the arg-function and the t-steps tomorrow. I see no 'barriers' to finish the Barrier :-)

Wondered about this piece in your code:

    ests = abbapproxarr(X,y,t,4*num,H,expterms,bexpo,aexpo,sarr);
    for(idx=1,num-1,listput(output,List([t,y+zarr[idx]*1.0,X-1/2.0,ests[idx]])));
    for(idx=1,num-1,listput(output,List([t,y+(1-y),1.0*X+thtarr[idx]*1.0,ests[num+idx]])));
    for(idx=1,num-1,listput(output,List([t,y+zarr[num-idx+1]*1.0,X+1/2.0,ests[3*num-idx+1]])));
    for(idx=1,num-1,listput(output,List([t,y-(1-y),1.0*X+thtarr[num-idx+1]*1.0,ests[4*num-idx+1]])));

Shouldn't the ests[num+idx]] on the third line be ests[2*num-idx+1]]or is this the way we need to 'circle' the rectangle in the output ?

For a test run with: X=6*10^10, y0=0.2, t=0.1 (fixed), sidemesh=4, expterms=50, H=N, the 4th an the 5th line are exactly the same. Is this explainable (it happens in both our scripts)?

0.87524038925614816211 + 0.40712134142812282938*I
0.79900137493839171557 + 0.27554021685907936980*I
0.79278461797212248636 + 0.24413840876667147560*I
0.79944014223056187889 + 0.22503141811023204883*I
0.79944014223056187889 + 0.22503141811023204883*I
0.77625281672003038087 + 0.20948457589044145594*I
0.75255930935280297299 + 0.18606052586107960484*I
0.73273900316665637773 + 0.15624829089693730449*I
0.69450750586069787574 + 0.18635806430035775613*I
0.67486905871906783557 + 0.18775705404931481567*I
0.70506425053214957881 + 0.17117968373841511414*I
0.73273900316665637773 + 0.15624829089693730449*I
0.87524038925614816211 + 0.40712134142812282938*I
0.71032243179994172787 + 0.23242914166036003617*I
0.62667771470401409220 + 0.32227667735045571319*I
0.69450750586069787574 + 0.18635806430035775613*I

km-git-acc commented 6 years ago

@rudolph-git-acc While giving the (x,y) coordinates of each side to abbapproxarr we had always given them low to high, but some sides had to be traversed high to low. Assuming num = 4, abbapproxrarr will generate estimates for 4 points for each side including the side's endpoints. But of the 16 points, only 12 are unique, since two adjacent sides share an endpoint.

Assuming x is vertical and y is horizontal, and following a counterclockwise direction, this would be the order in which the abbapproxarr output would have to be rearranged. 1,2,3 -> 5,6,7 -> 12,11,10 -> 16,15,14

(1 and 13) are the same point. Similarly, (4 and 5), (8 and 12), (9 and 16).

Then these lines result in first line (idx=1 to num-1) -> (est[1] to est[3]) second line (idx=1 to num-1) -> (est[5] to est[7]) third line (idx=1 to num-1) -> (est[12] to est[10]) fourth line (idx=1 to num-1) -> (est[16] to est[14]) which is the expected behavior. So we should not be getting repeated output.

This is the output I am getting (using the function from the barrier_multieval file) X=6*10^10; y0=0.2; t=0.1; sidemesh=4; expterms=50; N = floor(sqrt((X+Pi*t/4)/(4*Pi))); H=N; abbeff_multieval_symmetric_rectangle(X,y0,t,sidemesh,H,expterms) for(i=1,length(out),print(out[i][4]));

0.875240389256148162112553510562 + 0.407121341428122829376897812879I 0.799001374938391715569847242551 + 0.275540216859079369803125303524I 0.792784617972122486359526116587 + 0.244138408766671475598657968284I 0.799440142230561878890003929265 + 0.225031418110232048833210903233I 0.776252816720030380873487802706 + 0.209484575890441455944282402357I 0.752559309352802972994441372202 + 0.186060525861079604840369477816I 0.732739003166656377730520745520 + 0.156248290896937304490333026955I 0.705064250532149578813826817615 + 0.171179683738415114143511802630I 0.674869058719067835574670385372 + 0.187757054049314815673758114665I 0.694507505860697875739371193931 + 0.186358064300357756128584173298I 0.626677714704014092198904651762 + 0.322276677350455713192322691052I 0.710322431799941727872909847796 + 0.232429141660360036166317971339I

Perhaps another way to go about this would be to give abbapproxarr only the 4*num - 4 coordinates in the desired order, shifting the above complexity to the earlier part where sarr, bexpo, aexpo vectors are being formed.

km-git-acc commented 6 years ago

@rudolph-git-acc Some time benchmarks for the multieval parigp script with y=0.2, t=0 (ddxbound calculated through the arb script). Arb is likely to give much better timings.

X=10^13, ddxbound ~ 10k, multieval time ~ 1.8 min (stored sums 1.5 min) X=10^14, ddxbound ~ 16k, multieval time ~ 5.5 min (stored sums 5 min) X=10^15, ddxbound ~ 28k,
X=10^16, ddxbound ~ 47k,

The ddxbound doesn't increase dramatically (although ddtbound starts higher and increases more rapidly), and calculating the stored sums takes the majority of the time.

One intriguing possibility we could explore later when aiming for high X is to reuse the stored sums across t steps.

So instead of using n0htmat[v,h] while estimating n0sumsbmat and n0sumsbmat we could expand exp((t/4)*log(1+(h-H/2)/n0[v])^2) = exp(t*g(h,v))as 1+ t*g(h,v) + (t^2)*(g(h,v)^2/2!) + (t^3)*(g(h,v)^3/3!) + .. upto lets say expterms_2 number of terms. which would then require expterms_2 matrices [n0htmat0, n0htmat1, n0htmat2,..] and correspondingly expterms_2 matrices each of [n0sumsbmat0,n0sumsbmat1,..] and [n0sumsamat0,n0sumsamat1,..]

So the number of final stored sums would increase by expterms_2 times. This is however not a bottleneck since with numn0 =1, n0sumsbmat and n0sumsamat are small matrices of dimension (1,expterms).

Then for a particular t step we would calculate n0sumsbmat = n0sumsbmat0 + t*n0sumsbmat1 + (t^2)*n0sumsbmat2 + .. and similarly for n0sumsamat in real time which would then be used for the main Taylor expansion.

One additional complexity here is that we will have choose and pass both expterms and expterms_2 parameters to the function since both will affect the quality of the approximation.

Also, one bottleneck which is becoming apparent at high X are the large intermediate matrices of dimension (numn0,H) : n0hbmat, n0hamat, n0hlogmat and n0htmat (the last one multiple if trying to reuse across t steps). These are not needed once n0sumsbmat and n0sumsamat (or their multiple counterparts) are computed and can be cleared, but we may run out of memory even before that. There seem to be inefficient ways to tackle this, but can't think of an efficient way right now.

rudolph-git-acc commented 6 years ago

@km-git-acc

Thanks for the clear explanations. Very helpful!

As you suggested, I put the ordering of the output already inside the 4 sections with sarr, afac, bexpo, aexpo, etc. So, no need for the the output-list anymore and the ests-vector immediately contains the correct sequence (and is smaller).

Have made good progress today, but also have encountered a few nasty bugs. Also once again ran into the trap of a 'drifting' error term. But hey, steep learning curve and now knew how to solve it! You have to keep your eyes on the Ball all the time!

I now have a fully working working Barrier version that gives exactly the same output as your pari/gp script. To get an exact match I changed your xNp1's in the ddx/ddt-bounds back to xN and I also feed the X and the N into the procedure (so avoid the xN-function). Some output below, same parameters as before and fields the same as the report summary. It performs quite fast :-)

Rectangle (1) : 0.000000, 12878.265656, 2768.535973, 0.000000, 0.406844, 11072
Rectangle (2) : 0.000016, 12872.257799, 2767.240582, 0.000000, 0.406957, 11068
Rectangle (3) : 0.000032, 12866.248303, 2765.944839, 0.000000, 0.407070, 11060
Rectangle (4) : 0.000047, 12860.237166, 2764.648745, 0.000000, 0.407183, 11056
Rectangle (5) : 0.000063, 12854.224389, 2763.352299, 0.000000, 0.407295, 11052
Rectangle (6) : 0.000079, 12848.209974, 2762.055501, 0.000000, 0.407408, 11048
Rectangle (7) : 0.000095, 12842.193919, 2760.758351, 0.000000, 0.407521, 11040
Rectangle (8) : 0.000111, 12836.176226, 2759.460851, 0.000000, 0.407634, 11036
Rectangle (9) : 0.000127, 12830.156894, 2758.162998, 0.000000, 0.407746, 11032

Now need to put some final validations in and also make a selection switch to print the full details by choice.

I also have the feeling we haven't reached the final speed yet. There definitely is room in the ARB code to get rid of for instance the expterms series and to use the standard ARB-matrix Taylor function for this. I do however run into issues with that, so let's discuss further optimisations later (like your ideas!).

Expect to share Barrier version 1 tomorrow.

rudolph-git-acc commented 6 years ago

@km-git-acc

Just did a full test run for X=60000002099, y0=t0=0.2 and it completed in 44 minutes (440 rectangles processed and I recompute the ddx and ddx-bounds for each rectangle).

Now working on the final print steps and wondered about this one:

    X=X+1/2;y=(1+y)/2;
    (...)
    for(idx=1,num-1,listput(output,List([t,y+zarr[idx]*1.0,X-1/2.0,ests[idx]])));
    for(idx=1,num-1,listput(output,List([t,y+(1-y),1.0*X+thtarr[idx]*1.0,ests[num+idx]])));
    for(idx=1,num-1,listput(output,List([t,y+zarr[num-idx+1]*1.0,X+1/2.0,ests[3*num-idx+1]])));
    for(idx=1,num-1,listput(output,List([t,y-(1-y),1.0*X+thtarr[num-idx+1]*1.0,ests[4*num-idx+1]])));
    X=X-1/2;y=2*y-1;

To print the exact abbeff coordinates, shouldn't the X-1/2.0 and X+1/2.0 be X-1.0 and X since you already added a 1/2 to X in the beginning?

EDIT: Ignore my comment! I see now. We are still calculating the Barrier from X to X+1 and only the main calculations are centered at X=1/2. The 'ensembles' (thtarr for the X) move us 'cheaply' to the rectangles sides. All seems indeed correct.

rudolph-git-acc commented 6 years ago

@km-git-acc

Here is the ARB-version of the Barrier version 1. Grateful if you could remove the previous one from GitHub since it is now obsolete.

https://github.com/km-git-acc/dbn_upper_bound/blob/master/dbn_upper_bound/arb/BarrierV1.c

There are definitely further optimisations to be made to the code. I will better frame up the opportunity of using the ARB-exp finite Taylor expansion later tonight. Maybe you'll immediately see how it could be applied.

If it all works, I guess we now need to find a good X-location in the 10^13 domain to tackle the conditional Lambda =< 0.18 (t=0.16, y=0.2). Did you already search for promising 'spots' in that realm?

km-git-acc commented 6 years ago

@rudolph-git-acc Great. I will test it out. Have removed the test version from the parigp folder. I haven't yet started on searching for good barrier locations, and should be doing that tomorrow. Instead I ended up spending time on the tweaked approach which can reuse the stored sums across t steps.

Attaching the script (for lack of a better name I am currently calling it t agnostic) https://github.com/km-git-acc/dbn_upper_bound/blob/master/dbn_upper_bound/pari/barrier_multieval_t_agnostic.txt

This ran through the X=60000002099, y0=t0=0.2 barrier in about 16 minutes (comparable time with the barrier_multieval (non t agnostic) script was around 2 hours, so this is around 8x faster at this height (the gains should be higher as X increases)). About 5 min of this was the initial overhead since it is calculating a lot more stored sums.

Some other optimizations I came across (which are reusable even in the non t agnostic version)

the factorials can be shifted within the stored sums so as to not compute them everytime
large factorials may get approximated if the set precision is not enough. To counter that the factorial_divide approach can be used where the numerator value is recursively divided by the n integers (descending) constituting n!
the main taylor expansions in abbapproxarr can be setup as polynomials to which we pass values. I haven't tested whether this is speedier than the sum approach earlier, although it looks a bit elegant. Both may not be needed in Arb if the finite taylor expansion approach works.

rudolph-git-acc commented 6 years ago

@km-git-acc

Wow! That is an amazing speed improvement and if I understand it all correctly you have now built a script that does all recalculations required for an optimised 3D (x,y and t) ensemble :-)

Pari/gp beats ARB here since I don't think it will be easy to store a matrix in another matrix in ARB. Maybe there is a way around it.

And I was so proud of my 44 minutes... ;-)

km-git-acc commented 6 years ago

@rudolph-git-acc While trying the BarrierV1.c script (for eg. ./a.out 1000000000 0.2 0.2 50 0), I have been getting a Segmentation Fault error which I haven't been able to solve. Does it have to be built using some special options? I tried gcc BarrierV1.c -larb -lflint and gcc BarrierV1.c -larb -lflint -lm

Regarding the Arb matrix, is it possible to use three dimensional arrays or tensors? That may be a way to solve the problem. Or given that by fixing H=N, there is only one n0, we could turn the associated n0sumsprebmat and n0sumspreamat vectors of matrices into a ttaylorterms*expterms matrix, and the resulting n0sumsbmat and n0sumsamat matrices into vectors of expterms length.

rudolph-git-acc commented 6 years ago

@km-git-acc

Just tried my script with the same parameters and it works on my machine. I use the three options:
-larb -lflint -lgmp. TheSegmentation Fault error` ( I got the error 11 of that type a few times during debugging) does have to do something with memory allocation, but could also have other causes.

I did note though that I had forgotten to divide the winding number by 2*pi at the end. Will prepare a v1.1 after you got the script running.

About the vectors and matrices in ARB: it doesn't seem to support 3D- matrices nor tensors, so that isn't an option. The ways around it that you suggest are certainly worth trying with the numn0=1 in particular!

The ARB function I am really keen to apply is a fast finite Taylor expansion on all cells of a matrix A as follows: matrix S = sum k=0...taylorterms-1 (matrix A)^k/k!.

I would really like to use it, to replace this part:

factorial_divide(sum(h=1,H,n0htmat[m][v,h]*n0hlogmat[v,h]^(expo-1)*n0hbmat[v,h]),expo-1)))

I could easily set A = n0hlogmat[v,h] (and remove the divide by factorial) but then couldn't find a way to get it multiplied by n0htmat[m][v,h] or n0hbmat[v,h since these terms also need to be divided by the factorials and/or are dependent on m (the Taylor term counter). Do you see an opportunity here?

EDIT: Using that numn0=1 simplifies the required code considerably and avoids the need for 3D matrices. Will commence coding later today!

km-git-acc commented 6 years ago

@rudolph-git-acc Using matrix multiplication for the stored sums is a great idea. While it wasn't clear whether calculating the sum matrix S above can be used directly, the matrix A' = A^k/k! would be quite useful.

For eg. assuming numn0=1 (so eliminating v), H =2, expterms=3, ttaylorterms = 3

A' (derived from Taylor terms of A) =

1	1
n0hlogmat[h1]	n0hlogmat[h2]
n0hlogmat[h1]^2/2!	n0hlogmat[h2]^2/2!

B =

n0hbmat[h1]	0
0	n0hbmat[h2]

C =

n0htmat0[h1]	n0htmat1[h1]	n0htmat2[h1]
n0htmat0[h2]	n0htmat1[h2]	n0htmat2[h2]

then we get BCA' =

n0htmat0[h1]*n0hbmat[h1]	n0htmat1[h1]*n0hbmat[h1]	n0htmat2[h1]*n0hbmat[h1]
n0htmat0[h2]*n0hbmat[h2]	n0htmat1[h2]*n0hbmat[h2]	n0htmat2[h2]*n0hbmat[h2]

multiply

1	1
n0hlogmat[h1]	n0hlogmat[h2]
n0hlogmat[h1]^2/2!	n0hlogmat[h2]^2/2!

= n0sumsprebmat matrix (dimension expterms*ttaylorterms) (similarly for n0sumspreamat)

I will try this out as well.

On the existing script, I havent been able to resolve the error yet, but have been able to do some diagnosis. On running a.out with the -v option, for eg, ./a.out -v 1000000000 0.2 0.2 50 0, I get Error in `./a.out': munmap_chunk(): invalid pointer: Googling for this error threw up results like this is related to the freeing up of pointers.

rudolph-git-acc commented 6 years ago

@km-git-acc

Great idea and a again very nice potential further simplification! Will try to code this in ARB.

Just to be clear: are n0htmat0, n0htmat1, n0htmat2 indexed by exp/Taylorterm (i.e. in their last character)?

Is it ok to make the Taylor and exp terms just the same?

Let's work the error on the previous Barrier script later (problem can be very deep). I would really like to see new algorithm in action first :-)

rudolph-git-acc commented 6 years ago

@km-git-acc

I have code the three matrices A, B and C. They have the following dimensions: A: (Taylor terms, H) B: (H, H) C: (H, Taylor terms)

The script keeps complaining about ' incompatible dimensions' that can't be multiplied. Maybe this is ARB_mat related. Does such multiplication work in pari/gp? (I could of course transpose C and multiply it with A, but then still B is causing trouble).

km-git-acc commented 6 years ago

@rudolph-git-acc I guess it should be A'BC then as that allows the multiplication to happen. Also A' is expterms*H (I have also been wondering whether expterms and ttaylor terms could be same but haven't tried this out. The log component is the same but ttaylorterms are raised to double the power. Also after those terms are multiplied with (t/4)^k, they should get smaller faster.

With A': (expterms, H) B: (H, H) C: (H, ttaylorterms)

A'BC should go through and give (expterms,ttaylorterms).

km-git-acc / dbn_upper_bound

Conditional dbn bounds #81