Open km-git-acc opened 6 years ago
@km-git-acc
Great. Have done extensive testing on the tloop script today and found that at heights > 10^19 and with a larger number of threads (>12) it started to hit memory issues. The root cause was that for sarr, zarr, bexpo, afac, bsums
etc., I used matrices that were initialised for each thread with range 1..num (i.e. the number of mesh points on one side of the rectangle). Then realised that matrices weren't actually required (except for ests
, that is of size '4xnum-4, to which the argument principle is applied at the end) and replaced these by simple variables. All memory issue are gone and tloop now runs another factor 1.5 faster than before :) Will upload the new version to Github.
Even in case ests
becomes too large for memory (no issues yet), I do have a routine to apply the argument principle directly to each two subsequent mesh points (except for 4 'gaps' that I need to evaluate at the end of the rectangle process). So, I might be able to do without matrices all together if push comes to shove.
You are correct that the tloop process could potentially be 'Boinc-ed', however there are many connections between the process steps to be handled with caution. Only after a rectangle has been completely processed, we know the 'minmodabb'
and that in turn is required to establish the next t-step
hence determines the t
for the next rectangle. Just splitting the range 0..t_0 into smaller steps might therefore cause problems, right?
Will also search for routines to numerically verify the RH. I'd thought that these all boil down to counting the zeros on the critical line (using the arg-function to count sign changes) and then applying the argument principle (+Rouhé) to verify the rectangle 0..1/2, 0..iT
and comparing the two outcomes. But there are probably different and smarter ways to do this for larger heights
@rudolph-git-acc Indeed, i checked the new script even with a single thread and it was slightly faster.
In the tloop step, for a given t range [tstart,tend], tstart does not rely on minmodabb values, which are used only for determining subsequent t's. For eg. this is the output generated by splitting the 6x10e10 tloop into 4 parts.
docker run dbnupperbound/arbcalc:v9 bash -c "wget http://anthgrid.com/dbnupperbound/storedsums/SingleStoredSums_x60000155019_dig_20.txt -O sss.txt; ./tloopsingle 0.00 0.02 0.2 0 sss.txt";
docker run dbnupperbound/arbcalc:v9 bash -c "wget http://anthgrid.com/dbnupperbound/storedsums/SingleStoredSums_x60000155019_dig_20.txt -O sss.txt; ./tloopsingle 0.02 0.05 0.2 0 sss.txt";
docker run dbnupperbound/arbcalc:v9 bash -c "wget http://anthgrid.com/dbnupperbound/storedsums/SingleStoredSums_x60000155019_dig_20.txt -O sss.txt; ./tloopsingle 0.05 0.10 0.2 0 sss.txt";
docker run dbnupperbound/arbcalc:v9 bash -c "wget http://anthgrid.com/dbnupperbound/storedsums/SingleStoredSums_x60000155019_dig_20.txt -O sss.txt; ./tloopsingle 0.10 0.20 0.2 0 sss.txt";
Rectangle(1) : 0, 12880, 2770, 0.000000, 4.6295791526613631141, 11076
Rectangle(2) : 0.00032061949943023005544, 12758, 2744, 0.000000, 4.614940433284710646, 10972
Rectangle(3) : 0.0006431575487549447949, 12637, 2717, 0.000000, 4.6002924173972816219, 10864
Rectangle(4) : 0.00096762478127827166219, 12517, 2691, 0.000000, 4.5856355203733802906, 10760
Rectangle(5) : 0.0012940317094857798743, 12397, 2666, 0.000000, 4.5709701672512488026, 10660
Rectangle(6) : 0.0016224152028512109304, 12277, 2640, 0.000000, 4.5562956124780297, 10556
Rectangle(7) : 0.0019528131512488674996, 12158, 2614, 0.000000, 4.5416110947562938455, 10452
Rectangle(8) : 0.0022852371597006107012, 12039, 2589, 0.000000, 4.5269170418175539788, 10352
Rectangle(9) : 0.0026197264895301276028, 11921, 2563, 0.000000, 4.5122126737156098699, 10248
Rectangle(10) : 0.0029562932770240970576, 11803, 2538, 0.000000, 4.4974984245255042342, 10148
Rectangle(11) : 0.0032949782236076354999, 11686, 2512, 0.000000, 4.4827734949074217703, 10044
Rectangle(12) : 0.003635794028408886721, 11569, 2487, 0.000000, 4.468038325386630336, 9944
Rectangle(13) : 0.0039787829060462478011, 11452, 2462, 0.000000, 4.4532920964433414579, 9844
Rectangle(14) : 0.0043239882934408811803, 11336, 2437, 0.000000, 4.4385339702129262533, 9744
Rectangle(15) : 0.0046714242470588175117, 11221, 2412, 0.000000, 4.4237643890564660553, 9644
Rectangle(16) : 0.0050211047023708633236, 11105, 2387, 0.000000, 4.408983806720157697, 9544
Rectangle(17) : 0.0053731068461547586588, 10991, 2363, 0.000000, 4.394190032321037396, 9448
Rectangle(18) : 0.0057274140094994076804, 10877, 2338, 0.000000, 4.3793848524889713264, 9348
Rectangle(19) : 0.0060840734608636598939, 10763, 2313, 0.000000, 4.3645673815044667182, 9248
Rectangle(20) : 0.0064431338883935741109, 10649, 2289, 0.000000, 4.349736712950571079, 9152
Rectangle(21) : 0.0068046454587711279732, 10536, 2265, 0.000000, 4.3348919188494100758, 9056
Rectangle(22) : 0.0071686253295806771451, 10424, 2240, 0.000000, 4.3200334549891288049, 8956
Rectangle(23) : 0.0075350905497446380819, 10312, 2216, 0.000000, 4.3051617898216623318, 8860
Rectangle(24) : 0.0079040938264922779512, 10200, 2192, 0.000000, 4.2902759656198110106, 8764
Rectangle(25) : 0.0082756895093961809914, 10089, 2168, 0.000000, 4.2753750011789116258, 8668
Rectangle(26) : 0.0086498965667040322775, 9979, 2144, 0.000000, 4.2604593649000756862, 8572
Rectangle(27) : 0.0090267338615131389702, 9869, 2121, 0.000000, 4.2455295386920883992, 8480
Rectangle(28) : 0.0094062585893165727932, 9759, 2097, 0.000000, 4.2305845086894495099, 8384
Rectangle(29) : 0.0097885297757792687159, 9650, 2074, 0.000000, 4.215007805813597432, 8292
Rectangle(30) : 0.010173504677936118191, 9541, 2050, 0.000000, 4.1982387825648442214, 8196
Rectangle(31) : 0.010561120104260910584, 9432, 2027, 0.000000, 4.1814734530114327098, 8104
Rectangle(32) : 0.010951437476293505231, 9325, 2004, 0.000000, 4.164710611339387204, 8012
Rectangle(33) : 0.011344435933273600373, 9217, 1980, 0.000000, 4.1479526075106240624, 7916
Rectangle(34) : 0.011740221178745079603, 9111, 1957, 0.000000, 4.1311964114021889107, 7824
Rectangle(35) : 0.012138771986713709711, 9004, 1935, 0.000000, 4.1144444216005460327, 7736
Rectangle(36) : 0.012540198510658683726, 8899, 1912, 0.000000, 4.0976935323640538815, 7644
Rectangle(37) : 0.012944479163806683937, 8793, 1889, 0.000000, 4.0809461925901332627, 7552
Rectangle(38) : 0.013351728816097157412, 8689, 1867, 0.000000, 4.0641992174345833147, 7464
Rectangle(39) : 0.013761925526585658196, 8584, 1844, 0.000000, 4.0474551078718644905, 7372
Rectangle(40) : 0.014175189169161597676, 8481, 1822, 0.000000, 4.0307105962423514738, 7284
Rectangle(41) : 0.014591497457835380422, 8377, 1800, 0.000000, 4.0139682374478265404, 7196
Rectangle(42) : 0.015010975580963926028, 8275, 1777, 0.000000, 3.9972246770515417588, 7104
Rectangle(43) : 0.015433600919580426543, 8172, 1755, 0.000000, 3.9804825258218744988, 7016
Rectangle(44) : 0.015859504312363328464, 8071, 1733, 0.000000, 3.9637383383124382356, 6928
Rectangle(45) : 0.016288662822871622137, 7970, 1712, 0.000000, 3.94699478319153939, 6844
Rectangle(46) : 0.016721159031553120178, 7869, 1690, 0.000000, 3.9302504125623896801, 6756
Rectangle(47) : 0.017157078514659282293, 7769, 1668, 0.000000, 3.9135037425705781281, 6668
Rectangle(48) : 0.01759645343325505757, 7669, 1647, 0.000000, 3.8967553991424493332, 6584
Rectangle(49) : 0.018039373683501823685, 7570, 1626, 0.000000, 3.8800038505658406127, 6500
Rectangle(50) : 0.018485873531661115708, 7471, 1604, 0.000000, 3.8632497298611131823, 6412
Rectangle(51) : 0.018936047501659926199, 7373, 1583, 0.000000, 3.8464914546481039986, 6328
Rectangle(52) : 0.01938993214219269495, 7275, 1562, 0.000000, 3.8297296647830023536, 6244
Rectangle(53) : 0.019847626941475581871, 7178, 1541, 0.000000, 3.8129627241766841528, 6160
------------------------------------------------------------------------------------------------------------
Rectangle(1) : 0.02, 7146, 1534, 0.000000, 3.8074107470722755211, 6132
Rectangle(2) : 0.02046283385769273377, 7050, 1514, 0.000000, 3.790637706543950258, 6052
Rectangle(3) : 0.020929590979188329551, 6954, 1493, 0.000000, 3.7738602682577860282, 5968
Rectangle(4) : 0.021400379053427297919, 6858, 1473, 0.000000, 3.7570767113976133805, 5888
Rectangle(5) : 0.021875310040799361696, 6763, 1452, 0.000000, 3.7402852664502645934, 5804
Rectangle(6) : 0.022354429553803984584, 6669, 1432, 0.000000, 3.7234865863117914924, 5724
Rectangle(7) : 0.022837783367915814167, 6575, 1412, 0.000000, 3.7066813453853006423, 5644
Rectangle(8) : 0.023325491557328027193, 6482, 1392, 0.000000, 3.6898676942278805828, 5564
Rectangle(9) : 0.023817603204077160266, 6389, 1372, 0.000000, 3.6730463127283116076, 5484
Rectangle(10) : 0.024314245294033070676, 6296, 1352, 0.000000, 3.6562152811298696591, 5404
Rectangle(11) : 0.024815550135381525196, 6205, 1332, 0.000000, 3.6393726210105860333, 5324
Rectangle(12) : 0.025321492540056881527, 6113, 1312, 0.000000, 3.6225217011948328452, 5244
Rectangle(13) : 0.025832292752913880193, 6022, 1293, 0.000000, 3.6056577634379342173, 5168
Rectangle(14) : 0.026348011411737848014, 5932, 1273, 0.000000, 3.5887814744038446653, 5088
Rectangle(15) : 0.026868709569931348463, 5842, 1254, 0.000000, 3.5718935256877936742, 5012
Rectangle(16) : 0.027394538656834428517, 5753, 1235, 0.000000, 3.5549917550062233717, 4936
Rectangle(17) : 0.027925564513779713303, 5664, 1216, 0.000000, 3.5380768558167503903, 4860
Rectangle(18) : 0.028461948139453574601, 5576, 1197, 0.000000, 3.5211465734047317206, 4784
Rectangle(19) : 0.029003760652617981294, 5488, 1178, 0.000000, 3.5042016017305627769, 4708
Rectangle(20) : 0.029551173480921655269, 5401, 1159, 0.000000, 3.4872395866818605919, 4632
Rectangle(21) : 0.030104263572882747769, 5314, 1140, 0.000000, 3.4702612211139438986, 4556
Rectangle(22) : 0.030663213746219959643, 5228, 1122, 0.000000, 3.4532640438139307031, 4484
Rectangle(23) : 0.031228107404179778064, 5142, 1103, 0.000000, 3.4362487446321530652, 4408
Rectangle(24) : 0.031799139832151803164, 5057, 1085, 0.000000, 3.4192127467155946915, 4336
Rectangle(25) : 0.032376401597371418487, 4973, 1067, 0.000000, 3.4021567346652582991, 4264
Rectangle(26) : 0.03295998429084925044, 4889, 1049, 0.000000, 3.3850814239875312095, 4192
Rectangle(27) : 0.033550101170372165394, 4805, 1031, 0.000000, 3.3679840831599296477, 4120
Rectangle(28) : 0.034146976109635418182, 4722, 1013, 0.000000, 3.3508618762058697134, 4048
Rectangle(29) : 0.034750716447671392285, 4639, 995, 0.000000, 3.3337154691257773319, 3976
Rectangle(30) : 0.035361562636316742001, 4557, 978, 0.000000, 3.3165418785168438326, 3908
Rectangle(31) : 0.035979631953524739331, 4476, 960, 0.000000, 3.2993417556940263842, 3836
Rectangle(32) : 0.036605043427093556664, 4395, 943, 0.000000, 3.2821157860383463611, 3768
Rectangle(33) : 0.037238061808444716245, 4314, 926, 0.000000, 3.2648607858067105606, 3700
Rectangle(34) : 0.037878966023977101633, 4234, 908, 0.000000, 3.2475734374977726163, 3628
Rectangle(35) : 0.038527896925606240182, 4155, 891, 0.000000, 3.2302543495994149576, 3560
Rectangle(36) : 0.039184997852104294323, 4076, 874, 0.000000, 3.2129041665502009853, 3492
Rectangle(37) : 0.039850577873338396626, 3998, 858, 0.000000, 3.1955193294807797031, 3428
Rectangle(38) : 0.040524794814179011859, 3920, 841, 0.000000, 3.1781004538383664706, 3360
Rectangle(39) : 0.041207983705464309428, 3843, 824, 0.000000, 3.1606437662229190217, 3292
Rectangle(40) : 0.041900318799459344302, 3766, 808, 0.000000, 3.1431498457654896447, 3228
Rectangle(41) : 0.042602164217878221, 3690, 791, 0.000000, 3.1256146823394191948, 3160
Rectangle(42) : 0.043313712912279147611, 3614, 775, 0.000000, 3.1080388099852412452, 3096
Rectangle(43) : 0.044035361725224704126, 3539, 759, 0.000000, 3.0904179564319201859, 3032
Rectangle(44) : 0.04476732497937331113, 3465, 743, 0.000000, 3.0727526000857139908, 2968
Rectangle(45) : 0.04550982212225519107, 3391, 727, 0.000000, 3.0550432594079790701, 2904
Rectangle(46) : 0.046263299933935515184, 3317, 711, 0.000000, 3.0372852913654193306, 2840
Rectangle(47) : 0.04702823369678309412, 3244, 696, 0.000000, 3.0194738113833471357, 2780
Rectangle(48) : 0.047804890235433940955, 3172, 680, 0.000000, 3.0016091471252901391, 2716
Rectangle(49) : 0.048593543812711775173, 3100, 665, 0.000000, 2.9836916652765423449, 2656
Rectangle(50) : 0.049394734672478401736, 3029, 649, 0.000000, 2.9657160138628231997, 2592
------------------------------------------------------------------------------------------------------------
Rectangle(1) : 0.05, 2976, 638, 0.000000, 2.9522854869097730149, 2548
Rectangle(2) : 0.050824020660923982868, 2906, 623, 0.000000, 2.9342044506715830815, 2488
Rectangle(3) : 0.05166166844160931772, 2836, 608, 0.000000, 2.9160614186774491472, 2428
Rectangle(4) : 0.052513594188674708816, 2767, 593, 0.000000, 2.8978502361952538818, 2368
Rectangle(5) : 0.053380182636884052467, 2698, 579, 0.000000, 2.8795709396590789615, 2312
Rectangle(6) : 0.054262158522599055788, 2630, 564, 0.000000, 2.8612168467459935979, 2252
Rectangle(7) : 0.055159959605011981111, 2563, 549, 0.000000, 2.8427878135224122781, 2192
Rectangle(8) : 0.056074039906815497411, 2496, 535, 0.000000, 2.8242837259674491114, 2136
Rectangle(9) : 0.057005243322667840805, 2430, 521, 0.000000, 2.8056971120346269447, 2080
Rectangle(10) : 0.057954089870830238725, 2364, 507, 0.000000, 2.787027630540792851, 2024
Rectangle(11) : 0.05892152964686272301, 2299, 493, 0.000000, 2.7682670902709444841, 1968
Rectangle(12) : 0.059908161700047126875, 2234, 479, 0.000000, 2.7494148680444493168, 1912
Rectangle(13) : 0.060915061820031213409, 2170, 465, 0.000000, 2.7304619372591331505, 1856
Rectangle(14) : 0.06194292446392943145, 2107, 452, 0.000000, 2.7114073231610684564, 1804
Rectangle(15) : 0.062992477061537911971, 2044, 438, 0.000000, 2.6922500466767215669, 1748
Rectangle(16) : 0.064065006438581317825, 1982, 425, 0.000000, 2.6729797774086914542, 1696
Rectangle(17) : 0.065161363541209315532, 1920, 412, 0.000000, 2.6535950781737507564, 1644
Rectangle(18) : 0.066283027644424810718, 1859, 399, 0.000000, 2.6340844103956137285, 1592
Rectangle(19) : 0.06743100204485279013, 1799, 386, 0.000000, 2.6144457568995753746, 1540
Rectangle(20) : 0.068606347101495133307, 1739, 373, 0.000000, 2.5946770359366546042, 1488
Rectangle(21) : 0.069810876736881363672, 1680, 360, 0.000000, 2.5747647538781468191, 1436
Rectangle(22) : 0.071045855757046927254, 1621, 348, 0.000000, 2.5547060485509722846, 1388
Rectangle(23) : 0.072313410382926614043, 1564, 335, 0.000000, 2.5344855811097456159, 1336
Rectangle(24) : 0.073614232365733356764, 1506, 323, 0.000000, 2.5141123937629420219, 1288
Rectangle(25) : 0.074951624393464393963, 1450, 311, 0.000000, 2.4935562093242084437, 1240
Rectangle(26) : 0.076326490744722468752, 1394, 299, 0.000000, 2.4728260224807039901, 1192
Rectangle(27) : 0.077741717446645498874, 1339, 287, 0.000000, 2.451902447643588012, 1144
Rectangle(28) : 0.079199448923601128457, 1284, 276, 0.000000, 2.430779665076989285, 1100
Rectangle(29) : 0.080703171404190683975, 1230, 264, 0.000000, 2.4094350352041349598, 1052
Rectangle(30) : 0.082255557611673720528, 1177, 253, 0.000000, 2.38786069368305136, 1008
Rectangle(31) : 0.08385951741939084148, 1125, 242, 0.000000, 2.3660481794414819034, 964
Rectangle(32) : 0.085518226912227714283, 1073, 230, 0.000000, 2.343988366430657, 916
Rectangle(33) : 0.087236762202470637861, 1022, 220, 0.000000, 2.3216508269056636167, 876
Rectangle(34) : 0.089019199410793205001, 971, 209, 0.000000, 2.2990230407708591685, 832
Rectangle(35) : 0.090871952284913554299, 922, 198, 0.000000, 2.2760679883587006346, 788
Rectangle(36) : 0.092798273313502166701, 873, 188, 0.000000, 2.2527932834293598402, 748
Rectangle(37) : 0.094806054852367412795, 825, 178, 0.000000, 2.2291555917080474073, 708
Rectangle(38) : 0.096902001024134742986, 778, 167, 0.000000, 2.205133652159038079, 664
Rectangle(39) : 0.099093689523053815067, 731, 158, 0.000000, 2.1807036973877882761, 628
-------------------------------------------------------------------------------------------------------
Rectangle(1) : 0.1, 713, 154, 0.000000, 2.1708014191620375438, 612
Rectangle(2) : 0.10234333999882473709, 668, 144, 0.000000, 2.1457246061758069186, 572
Rectangle(3) : 0.10480699958890828038, 623, 134, 0.000000, 2.1201533731715991913, 532
Rectangle(4) : 0.10740756680106173012, 580, 125, 0.000000, 2.0940080337280439738, 496
Rectangle(5) : 0.11015585651438594387, 537, 116, 0.000000, 2.0672835444069747591, 460
Rectangle(6) : 0.11307444784475274976, 495, 107, 0.000000, 2.0398780066455815685, 424
Rectangle(7) : 0.11618531250464281353, 454, 99, 0.000000, 2.0117221742296555833, 392
Rectangle(8) : 0.11951509702937773772, 415, 90, 0.000000, 1.9827332721171080344, 356
Rectangle(9) : 0.12308794828749125106, 376, 82, 0.000000, 1.952881624089661062, 324
Rectangle(10) : 0.12695199516007013686, 338, 74, 0.000000, 1.9219786885750103792, 292
Rectangle(11) : 0.13115903270023288946, 302, 66, 0.000000, 1.8898714959262929079, 260
Rectangle(12) : 0.13576125619667756797, 267, 59, 0.000000, 1.8564726636159636002, 232
Rectangle(13) : 0.14084167815778604587, 233, 51, 0.000000, 1.8215572003906827267, 200
Rectangle(14) : 0.1465135974727675168, 200, 45, 0.000000, 1.7848250115559063391, 176
Rectangle(15) : 0.15293772253054704849, 169, 38, 0.000000, 1.7458613952661560903, 148
Rectangle(16) : 0.16030968344928169992, 140, 32, 0.000000, 1.7043045866414617366, 124
Rectangle(17) : 0.16891185906814928375, 113, 26, 0.000000, 1.6596632676841007873, 100
Rectangle(18) : 0.17917436586181389249, 88, 21, 0.000000, 1.6112351926148756764, 80
Rectangle(19) : 0.19180203850516475245, 65, 16, 0.000000, 1.557945213295379675, 60
I have now submitted 4 Boinc jobs to test this out in the grid as well
bin/boinc2docker_create_work.py --min_quorum 1 --target_nresults 1 dbnupperbound/arbcalc:v9 bash -c "wget http://anthgrid.com/dbnupperbound/storedsums/SingleStoredSums_x60000155019_dig_20.txt -O sss.txt; ./tloopsingle 0.00 0.02 0.2 0 sss.txt > /root/shared/results/tloopout_X_6x10e10_t_0.00_0.02_y_0.2.txt";
bin/boinc2docker_create_work.py --min_quorum 1 --target_nresults 1 dbnupperbound/arbcalc:v9 bash -c "wget http://anthgrid.com/dbnupperbound/storedsums/SingleStoredSums_x60000155019_dig_20.txt -O sss.txt; ./tloopsingle 0.02 0.05 0.2 0 sss.txt > /root/shared/results/tloopout_X_6x10e10_t_0.02_0.05_y_0.2.txt";
bin/boinc2docker_create_work.py --min_quorum 1 --target_nresults 1 dbnupperbound/arbcalc:v9 bash -c "wget http://anthgrid.com/dbnupperbound/storedsums/SingleStoredSums_x60000155019_dig_20.txt -O sss.txt; ./tloopsingle 0.05 0.10 0.2 0 sss.txt > /root/shared/results/tloopout_X_6x10e10_t_0.05_0.10_y_0.2.txt";
bin/boinc2docker_create_work.py --min_quorum 1 --target_nresults 1 dbnupperbound/arbcalc:v9 bash -c "wget http://anthgrid.com/dbnupperbound/storedsums/SingleStoredSums_x60000155019_dig_20.txt -O sss.txt; ./tloopsingle 0.10 0.20 0.2 0 sss.txt > /root/shared/results/tloopout_X_6x10e10_t_0.10_0.20_y_0.2.txt";
One thing I have noticed is that it is now quite difficult for a non-aggressive client to get assigned jobs, since the aggressive clients likely search for new jobs very frequently (maybe every few seconds), and maintain a healthy queue on their machines.
5000 subtotal matrices for the 9x10e21 storedsums are already complete, so this shouldn't take long now. 2x10e20 storedsums has about 300 subtotals pending, which should also likely complete by today or tomorrow.
I have the same impression about RH verification. While the algorithm used by Gourdon-Demichel has been described in detail in http://www.dtc.umn.edu/~odlyzko/unpublished/zeta.10to20.1992.pdf and zetazeros1e13-1e24.pdf what is missing are the links to the codes used. It also seems that Platt (dave.platt@bris.ac.uk) (in https://research-information.bristol.ac.uk/files/78836669/platt_zeta_submitted.pdf used interval arithmetic and the method used is more rigorous. It's possible the second method is slower than the first. Once we have decided which method to use, we could contact them and request for the code.
@km-git-acc
Great. You could indeed split up the rectangle processes this way, however the trick will be to allocate the work to client in an exponential way. The first t-steps are very small and the process greatly speeds up at the end. We therefore need some smart way to 'cut the cake' efficiently. The table below lists all data from the last t-loop runs:
X | Barrier offset | t0 | y0 | Λ | Starting 'num' | Number of rectangles | Time (secs) |
---|---|---|---|---|---|---|---|
6.00E+11 | 183290 | 0.2000 | 0.20000 | 0.22 | 19388 | 242 | 25 |
2.00E+12 | 129093 | 0.1980 | 0.15492 | 0.21 | 34980 | 384 | 74 |
5.00E+12 | 194858 | 0.1860 | 0.16733 | 0.20 | 40344 | 435 | 99 |
2.00E+13 | 131252 | 0.1800 | 0.14142 | 0.19 | 68376 | 689 | 280 |
6.00E+13 | 123375 | 0.1680 | 0.15492 | 0.18 | 81236 | 803 | 401 |
3.00E+14 | 188911 | 0.1610 | 0.13416 | 0.17 | 142012 | 1374 | 1241 |
2.00E+15 | 122014 | 0.1530 | 0.11832 | 0.16 | 259748 | 2239 | 3915 |
7.00E+15 | 68886 | 0.1390 | 0.14832 | 0.15 | 273548 | 2174 | 3993 |
6.00E+16 | 156984 | 0.1320 | 0.12649 | 0.14 | 562360 | 3949 | 16213 |
6.00E+17 | 88525 | 0.1220 | 0.12649 | 0.13 | 988908 | 6066 | 44170 |
9.00E+18 | 35785 | 0.1130 | 0.11832 | 0.12 | 2083348 | 12947 | 241003 |
2.00E+20 | 66447 | 0.1020 | 0.12649 | 0.11 | ? | ? | ? |
9.00E+21 | 70686 | 0.0930 | 0.11832 | 0.10 | ? | ? | ? |
Have not been yet been able to spot a predictive pattern to be able to predict what will happen in the next round, but a safe bet is to expect the number of triangles will double at least. We could also calculate the 'starting num' as an additional data point once the stored sums are ready and use that as an indicator. What is it that drives the number of rectangles needed? I guess this has something to do with the quality of the Barrier location choice (i.e. the height of the 'peak' of ABB_eff at that point), however went back to my selection data (attached) and this doen't explain the drop in rectangles 2x10^15
and 7x10^15
(EDIT: but the higher y
might!)
BarrierlocationselectionNoTriangle.txt
I also noticed that I got kicked out of Boinc, since there were not tasks left 'ready to send'. Guess some of the high performing clients indeed grabbed a substantial chunk from the workload to create a solid buffer. The speed of progress is impressive though.
Also read the Gourdon paper and there are quite a few nuances to consider that could substantially impact speed (hence having the (pseudo) code would be great). Surprised to see that Odlysko actually already calculated a couple of billion zeros in the 10^23 domain in the early 2000's (though not published).
@rudolph-git-acc Yeah, number of rectangles required is dependent on the ddtbound (dependent on X,y0,t) and minmodabb (dependent on X,t) values, and their values at t=0 make a significant impact on the overall proceedings.
It seems there are a few ways to simulate the minmodabb values, and hence the incremental t-steps. The tloop code on a single core processes anywhere between 6000 and 10000 mesh points per second (exact value dependent on number of taylor terms used), so if we assume an average of 8000 points, we can simulate the amount of time required at each t-step (better to slightly overestimate). This can help in identifying how to split the t range between jobs.
To simulate minmodabb values, we can either use approx. abbeff (with say 100k terms), or eulerprod(x,t) or just assume it to be a constant like 3.5. The first method is the most accurate (but slow atleast in gp), the second one not so much but is fast and the trend is alright. A good thing about the 3rd one apart from being fast is that it predicts smaller than actual tsteps (since minmodabb generally starts above 4) where most of the computation resides, so the critical jobs will take lesser time than estimated.
On trying this out with X=3x10^14+188911,
eulerprodt(x,t)=abs(prod(v=1,30,1/(1-1/prime(v)^(1-I*X/2+(t/4)*log((1-I*X/2)/prime(v))))));
X=3*10^14+188911;
y0=0.13416;
tstart=0;tend=0.161;
filename="tsplit.txt";
\\minmodabbstart=vecmin([abs(abbeff_approx(X,1,tstart)),abs(abbeff_approx(X+0.5,1,tstart)),abs(abbeff_approx(X+1,1,tstart))]);
\\minmodabbstart=vecmin([eulerprodt(X,tstart),eulerprodt(X+0.5,tstart),eulerprodt(X+1,tstart)]);
minmodabbstart=3.5;
t=tstart;
minmodabb=minmodabbstart;
tottimest=0;
counter=1;
while(t<=tend,{
Dtabb = ceil(ddtboundint(X+1,y0,t));
Dzabb = ceil(ddzboundint(X+1,y0,t));
timest=Dzabb/2000.0;
print([counter,t,Dtabb,4*Dzabb,minmodabb,timest]);
t = t + (minmodabb-0.5)/Dtabb;
\\minmodabb = vecmin([abs(abbeff_approx(X,1,t)),abs(abbeff_approx(X+0.5,1,t)),abs(abbeff_approx(X+1,1,t))]);
\\minmodabb = vecmin([eulerprodt(X,t),eulerprodt(X+0.5,t),eulerprodt(X+1,t)]);
tottimest=tottimest+timest;
if(tottimest>3600,tottimest=0;write(filename,t));
counter=counter+1;
});
we get in the tsplit file (excluding tstart and tend),
0.0033938467544100506907145464893504019839
0.0090426139257163597884666889062147590431
0.030102294503472653902564881426563175383
Have submitted these 4 jobs in Boinc to see whether they actually take less than an hour (the last one ofcourse will be much faster),
bin/boinc2docker_create_work.py --min_quorum 1 --target_nresults 1 dbnupperbound/arbcalc:v9 bash -c "wget http://anthgrid.com/dbnupperbound/storedsums/storedsum_nolemma_300000000188911_dig20.txt -O sss.txt; ./tloopsingle 0 0.0033938467544100506907145464893504019839 0.13416 0 sss.txt > /root/shared/results/tloopout_X_300000000188911_part1.txt";
bin/boinc2docker_create_work.py --min_quorum 1 --target_nresults 1 dbnupperbound/arbcalc:v9 bash -c "wget http://anthgrid.com/dbnupperbound/storedsums/storedsum_nolemma_300000000188911_dig20.txt -O sss.txt; ./tloopsingle 0.0033938467544100506907145464893504019839 0.0090426139257163597884666889062147590431 0.13416 0 sss.txt > /root/shared/results/tloopout_X_300000000188911_part2.txt";
bin/boinc2docker_create_work.py --min_quorum 1 --target_nresults 1 dbnupperbound/arbcalc:v9 bash -c "wget http://anthgrid.com/dbnupperbound/storedsums/storedsum_nolemma_300000000188911_dig20.txt -O sss.txt; ./tloopsingle 0.0090426139257163597884666889062147590431 0.030102294503472653902564881426563175383 0.13416 0 sss.txt > /root/shared/results/tloopout_X_300000000188911_part3.txt";
bin/boinc2docker_create_work.py --min_quorum 1 --target_nresults 1 dbnupperbound/arbcalc:v9 bash -c "wget http://anthgrid.com/dbnupperbound/storedsums/storedsum_nolemma_300000000188911_dig20.txt -O sss.txt; ./tloopsingle 0.030102294503472653902564881426563175383 0.161 0.13416 0 sss.txt > /root/shared/results/tloopout_X_300000000188911_part4.txt";
EDIT: The 4 boinc jobs completed, and attaching the concatenated output tloop_boincout_X_3x10e14+188911.txt
These were the timings of the 4 jobs: part1 - 2.27 hours part2 - 0.97 hours part3 - 1.08 hours part4 - 0.4 hours
As expected part2 took under an hour, part3 slightly more than an hour, and part4 was anyways the residual part. Part1 unexpectedly took 2.27 hours, which I am assuming was due to the peculiarities of the host machine that job was assigned to (that host has been taking more time on the other jobs too).
@km-git-acc
Great work and nice results! Will try the same run once again on 32 cores and see how fast it is (EDIT: using 64 virtual threads, the task completed in 332 secs). Did a quick check on the output from my earlier run and that was 1241 secs (on a 6-core machine).
Did some further work on the contour integral for Ht to isolate the zeros. Using the derivative as an integral instead of the H'-function, gives the correct results (after a while). It won't work for t=0, but hey, we won't need the complex integral to evaluate that level. Will now try this in ARB to see how fast a rectangle could be processed. Splitting the area of the contour into smaller parts is easily done, hence a root finding approach like this would also be very suitable for grid computing.
Xi(s)=(s/2)*(s-1)*Pi^(-s/2)*gamma(s/2)*zeta(s);
Ht(x, y, t)=1/(8*sqrt(Pi))*intnum(v=-8,8,Xi((1+I*(x+y*I))/2+sqrt(t)*v)*exp(-v^2));
Htd(x, y, t)=I/(8*sqrt(t*Pi))*intnum(v=-8,8,Xi((1+I*(x+y*I))/2+sqrt(t)*v)*exp(-v^2)*v);
f(z) = Ht(real(z),imag(z),0.1);
fd(z) = Htd(real(z),imag(z),0.1);
contint(func,vertices) = round(abs((1/2/Pi/I)*(intnum(v=vertices[1],vertices[2],fd(v)/func(v))+intnum(v=vertices[2],vertices[3],fd(v)/func(v))+intnum(v=vertices[3],vertices[4],fd(v)/func(v))+intnum(v=vertices[4],vertices[1],fd(v)/func(v)))));
contint(f,[25-I,25+I,30+I,30-I]) --> 1
@rudolph-git-acc Interesting. Having a direct expression for the derivative is a great development, and should result in more accurate results too.
332 seconds is quite impressive. I was analyzing the Boinc users, and these are some of the stats, +--------+------+ | userid | #open tasks +--------+------+ | 40 | 6009 | | 68 | 2149 | | 36 | 357 | | 43 | 352 | | 89 | 296 | | 7 | 219 | | 114 | 191 | | 31 | 100 | | 92 | 31 | | 51 | 20 | | 84 | 17 | | 62 | 17 | | 77 | 10 |
+--------+------+
| userid | #completed tasks
+--------+------+
| 40 | 8810 |
| 2 | 1476 |
| 36 | 1061 |
| 29 | 757 |
| 7 | 682 |
| 1 | 379 |
| 86 | 268 |
| 8 | 252 |
| 48 | 170 |
| 9 | 138 |
| 89 | 116 |
| 122 | 110 |
| 68 | 106 |
| 106 | 79 |
| 114 | 73 |
| 92 | 54 |
| 34 | 46 |
| 58 | 31 |
| 55 | 14 |
| 6 | 11 |
+--------+-----+ | userid | #host machines +--------+-----+ | 36 | 47 | | 40 | 28 | | 84 | 12 | | 8 | 6 | | 9 | 5 |
(not showing the tail in any of the three tables). Assuming all the hosts are low cost single core (may not be always true but messy to find out), your single machine is quite up there in terms of config. Also, I have tried to now restrict the number of open tasks to each host as 3*number of cores, so we should hopefully see a more equitable distribution of future jobs to users.
The 9x10e21 storedsums jobs are halfway complete, while the 2x10e20 ones are going through the tail effect (maybe they will both complete at roughly the same time).
@km-git-acc
Great, a bit of load balancing is smart (we otherwise could end up with only one client spending ages on the last 'cell').
Could you do a quick sanity check on whether the returned outputs still have decent accuracy?
@rudolph-git-acc The 9x10e21 outputs seem to have 10 to 12 digits of accuracy.
Subtotalindex: 0, start: 0, end: 1338093
9000000000000000070686.00000000, 114, 114, 10
50.2343729045, 6.5382939665, 3494.06987472, 448.99437777, 140382.1897443, 17649.7162562, 4201525.330523, 516012.659033, 102538664.80792, 12284069.86469, 2132637430.2851, 248305233.4740, 38794458594.213, 4366006626.541, 628004007388.80, 67896079323.83, 9161342100985.4, 945794755479.0, 1.21613597610789e+14, 1.1924458391587e+13, 1.4806....
Subtotalindex: 100, start: 133809300, end: 135147393
9000000000000000070686.00000000, 114, 114, 10
0.03304019628, 0.06834751427, 0.1746325774, 0.3612212935, 0.4615068731, 0.9545398146, 0.813092741, 1.68160.....
Subtotalindex: 997, start: 1334078721, end: 1335416814
9000000000000000070686.00000000, 114, 114, 10
0.01920383224, -0.00799040265, 0.02551016990, -0.01059822934, 0.01694372127, -0.00702857.....
It seems some of the allocated tasks have come back to the pool, and my boinc client has started receiving tasks again on updating the project. You may see similar behavior on your side.
EDIT: Links to the partial output generated so far, https://storage.googleapis.com/dbnupperbound/subt2x10e20.zip https://storage.googleapis.com/dbnupperbound/subt9x10e21.zip
@km-git-acc
Great. Accuracy is in line of the expectation! I asked, since I did a run to calculate a single cell (to serve as checksum). It ran 2 days and came back with 0... Have now started a new one with higher accuracy. Will check Boinc!
The contour integration runs pretty fast in pari/gp (e.g. 3 mins 20 for the rectangle 25-4i, 25+4i, 2000+4i, 2000-4i at t=-12 on single core machine), however evaluation is still way too slow in ARB. When running along the imaginary side of the rectangle all is fast, but when I try to integrate along the real axis side of the rectangle, it becomes unworkably slow. Very strange and seem to be related to the integral evaluation itself. Need to do some further digging. All in all this seems to be a promising route for root finding, esp. when we could parallelise the work and reduce time by a significant factor. What is actually the best approach to properly isolate/certify the individual roots and then to find them up to a certain accuracy? Seems some recursive method is needed here and did read about triangulation.
This is our formula to evaluate. Do you maybe see any opportunity to further simplify this? I have considered writing xi(s) as a polynomial (which is easily done with readily available polynomials for zeta, Gamma and exp), since I'd thought integration and differentiation would become much easier (i.e. just another polynomial). This obviously doesn't work since the integration variable v is not the polynomial variable...
EDIT: this is the pari/gp code:
default(realprecision, 20)
Xi(s)=(s/2)*(s-1)*Pi^(-s/2)*gamma(s/2)*zeta(s);
Ht(x, y, t) =1/(8*sqrt(Pi)) *intnum(v=-8,8,Xi((1+I*(x+y*I))/2+sqrt(t)*v)*exp(-v^2));
Htd(x, y, t)=I/(8*sqrt(t*Pi))*intnum(v=-8,8,Xi((1+I*(x+y*I))/2+sqrt(t)*v)*exp(-v^2)*v);
f(z) = Ht(real(z),imag(z),-12);
fd(z) = Htd(real(z),imag(z),-12);
contint(vertices) = round(abs((1/2/Pi/I)*(intnum(v=vertices[1],vertices[2],fd(v)/f(v))+intnum(v=vertices[2],vertices[3],fd(v)/f(v))+intnum(v=vertices[3],vertices[4],fd(v)/f(v))+intnum(v=vertices[4],vertices[1],fd(v)/f(v)))));
#
contint([25-4*I,25+4*I,2000+4*I,2000-4*I])
@rudolph-git-acc Well, I don't know whether something like a vectorized integration can be made to work, but that can be one way to exploit the similarity of the numerator and denominator integrands. So instead of calculating the two integrals separately, we could calculate something like
Const = I/sqrt(t);
h(v,z,t) = Xi((1+I*z)/2+sqrt(t)*v)*exp(-v^2);
g(z,t) = {
[D,F] = Integral{[Const*v,1]*h(v,z,t)};
return(D/F);
}
If this function can then be used within the contour integral, it should provide a 2x speedup.
I think the recursive approach you mentioned is generally the way to zero in on the roots. At each step, the contour area should decrease by half, so the number of steps required should ideally be logarithmic in terms of the digits accuracy.
On the Boinc front, I have enabled the forums (http://anthgrid.com/dbnupperbound/forum_index.php), after a suggestion by one of the users.
I have been thinking of mailing Prof. Platt about the work done so far, and whether he can share the RH verification scripts used in his paper. What do you think?
EDIT: Well, it does work. parigp can be neat sometimes.
t=-12;
const = I/sqrt(t);
h(v,z) = Xi((1+I*z)/2+sqrt(t)*v)*exp(-v^2);
g(z) = {[D,F] = intnum(v=-8,8,[const*v,1]*h(v,z)); return(D/F);}
contint(vertices) = round(abs((1/2/Pi/I)*(intnum(v=vertices[1],vertices[2],g(v))+intnum(v=vertices[2],vertices[3],g(v))+intnum(v=vertices[3],vertices[4],g(v))+intnum(v=vertices[4],vertices[1],g(v)))));
contint([25-4*I,25+4*I,2000+4*I,2000-4*I])
This took around 3 minutes on my machine, compared to 9 minutes earlier. A more compact version,
m4(j)=1+(j-1)%4;
h(v,z,t) = Xi((1+I*z)/2+sqrt(t)*v)*exp(-v^2);
g(z,t) = {[D,F] = intnum(v=-8,8,[v,1]*h(v,z,t)); return(D/F);}
contint(p,t) = round(abs((1/2/Pi/sqrt(t))*sum(j=1,4,intnum(v=p[m4(j)],p[m4(j+1)],g(v,t)))));
contint([25-4*I,25+4*I,2000+4*I,2000-4*I],-12)
@rudolph-git-acc I have been quite intrigued by the formula for the derivative, since it seems to be quite elegant and a deeper result, not just a standard application of the Leibniz rule. In fact, it seems there are two different integrands that on integrating give the same answer, which is amazing. How did you hit upon this formula..
h(z,t,v) = Xi((1+I*z)/2+sqrt(t)*v)*exp(-v^2);
k(z,t,v) = (I/sqrt(t))*v*h(z,t,v);
r(z,t) = intnum(v=-8,8,h(z,t,v));
f(z,t) = intnum(v=-8,8,h'(z,t,v));
g(z,t) = intnum(v=-8,8,k(z,t,v));
r'(30+8*I,-8) -> 0.0038989093364183130987 + 0.0021132950015132515725*I
f(30+8*I,-8) -> 0.0038989093364357291900 + 0.0021132950016088488043*I
g(30+8*I,-8) -> 0.0038989093365395572872 + 0.0021132950011405614916*I
h'(2+3*I,6,-5) -> -4.7516483118896075523 E-11 - 7.3920052300069603964 E-11*I
k(2+3*I,6,-5) -> -3.4209137327030026867 E-10 - 5.9845755513689015352 E-10*I
@km-git-acc
I actually found the derivative formula with a bit of luck. Was playing with the derivatives listed on the integral 3rd approach page, changed a few parameters and there it suddenly was... The only intuitive guess I made was: when t->0 in the Ht-integral, you'll end up with xi(s). This obviously can't happen for Ht', so I assumed there should at least be a division by t somewhere (i.e. to prevent it from being valid at t=0 since xi(s) is in it). Did try to derive the result through a variable change, but didn't completely get there. Also got intrigued by it's surprising beauty and that it still works for -t. It makes t=0 even more special and maybe the fact that this derivative integral breaks down is a sign that all complex zeros have to have collided before t=0 (i.e. a collision without a derivative can't occur).
And yes, also tried your proposed contour integral routine and also got a 100% speed boost. Nice idea, KM :)
On the RH-zeros front, yes, it would a good idea to contact prof. Platt and explain what we have been doing here. He actually might be interested. BTW, I stumbled upon this arxiv-paper by Fredrik (ARB): https://arxiv.org/pdf/1802.07942.pdf in which he explains a routine that does the contour for the logarithmic integral of zeta for a rectangle (i.e. a bit more than the critical strip) all the way up to T=10^9 (x=2*10^9) in only 1600 secs(!). I mean, if 10^9 is feasible this way (and still unthreaded/gridded), then why not just counting the zeros in the rectangle and comparing it with the number of sign changes on the critical line. Wouldn't that be sufficient?
@rudolph-git-acc If this is a new formula, i.e not yet published in literature, you should definitely go ahead and do it or atleast post about it. The xi function is a pretty important, and anything new regarding it is always looked forward to. Also, this may have an interesting proof.
Yeah we should use Fredrick's routine to establish the number of roots in the rectangle. The Odlyzko/Platt papers should help us on efficient ways to count the sign changes. On the latter it seems we have to decide whether we count the sign changes as in Odlyzko's work, which is fast, or we find the exact locations of zeros as done by Platt which would be comparatively slower but more rigorous.
@km-git-acc
Okay, will put a question on the blog. The formula actually does look quite close to this one on: http://michaelnielsen.org/polymath1/index.php?title=Bounding_the_derivative_of_H_t_-_third_approach, so still wondering whether just a smart change of variables will do the trick:
@rudolph-git-acc
Actually, I was now going through that wiki page myself, and it makes sense.
I guess the key trick was that although s and z are related, s becomes a formal variable under integration, and then differentiation wrt z leaves the xi function untouched. Didn't know earlier that this is allowed, but this pdf is quite illuminating http://www.math.uconn.edu/~kconrad/blurbs/analysis/diffunderint.pdf
On Tue, Oct 2, 2018, 11:42 PM rudolph-git-acc notifications@github.com wrote:
@km-git-acc https://github.com/km-git-acc
Okay, will put a question on the blog. The formula actually does look quite close to this one on: http://michaelnielsen.org/polymath1/index.php?title=Bounding_the_derivative_of_H_t_-_third_approach, so still wondering whether just a smart change of variables will do the trick:
[image: image] https://user-images.githubusercontent.com/36714969/46367894-1fb67e80-c67f-11e8-8a53-990b28734ba3.png
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/km-git-acc/dbn_upper_bound/issues/113#issuecomment-426376239, or mute the thread https://github.com/notifications/unsubscribe-auth/AiK-4j6SRIgcGJiEXGQXwC-AlebH1vQAks5ug6ySgaJpZM4W-3lD .
@km-git-acc
That makes sense, although I still haven't been able to fully derive the new formula. I'll hold off on posting since the answer is probably pretty straight forward. EDIT 1: Aaah. It was very easy after all...
Take this formula:
and do the variable change v = (sig + IT - (1-y+Ix)/2)/sqrt(t) so that d sig = sqrt(t) d v. This yields our simpler form.
With this 'trick' I could also establish the higher derivatives that form a nice pattern:
EDIT 2: Here is the generic formula for all derivatives:
EDIT 3: I found on the web that also "integration under the integral sign" is an allowed procedure and when I inject a simple finite integral running over z (e.g. one side of the rectangle), it does return a closed form using the erf-function (and xi(s) of course). This in turn needs then to be integrated from -8..8. Will try this out.
For some reason, I can't get this integral to work in pari/gp:
Code used (do you see any error?):
default(realprecision, 20)
Xi(s)=(s/2)*(s-1)*Pi^(-s/2)*gamma(s/2)*zeta(s);
Ht(x, y, t) =1/(8*sqrt(Pi*t))*intnum(v=-8+3*I,8+3*I,Xi(v)*exp(-((v-(1-y+I*x)/2)^2)/t));
Ht(100, 0.4, 0.4)
1.1397844575876168265 E2396 + 1.3490085027501526933 E2396*I
Although... your link brings me back to an earlier comment I made on writing xi(s) and exp as a polynomial with two parameters; the integration variable v and a parameter z. Apparently by doing partial differentiation under the integral sign (which is very fast on a polynomial) would give a new polynomial and then I could simply integrate that which yields another polynomial that gives the result (after evaluation).
@rudolph-git-acc Cool work on the Ht derivatives.. Can't find a problem in the Ht definition. On changing the T value from 3 to something else, the answer keeps changing. Does the integrand function become too oscillatory after the s substitution..That could be one reason parigp is facing issues here. exp(-v^2/2) was earlier acting as a dampener, but post substitution, it is taking very large values.
I think as long as the polynomials meet our target accuracy, we should be fine. We can benchmark those results against the actuals to see if everything works fine. The only thing we may have to be careful about (similar to the tloop procedure) is that we are using enough terms in the polynomials when X increases.
@km-git-acc
Tried the '(finite) integration under the integral sign' and managed to derive:
The good news is that it is very fast, the bad news that it is extremely unstable. I only got it to work reasonably well around -150 < t < -50. For positive t
it doesn't seem to work at all...
Anyway, I'll stop exploring this direction for now since it wouldn't help us for the root finding where the contour integral requires f'(x)/f(x), which is not within reach of this approach.
Code used:
default(realprecision, 60)
erf(x)=1-erfc(x)
Xi(s)=(s/2)*(s-1)*Pi^(-s/2)*gamma(s/2)*zeta(s);
Hti(z1, z2, t)=sqrt(t)*I/8*(intnum(k=-8,8,Xi(sqrt(t)*k+1/2+I*z1/2)*erf(-k)-Xi(sqrt(t)*k+1/2+I*z2/2)*erf(-k)))
Ht(z, t) =1/(8*sqrt(Pi))*intnum(v=-8,8,Xi((1+I*z)/2+sqrt(t)*v)*exp(-v^2));
Htc(z1, z2, t)=intnum(v=z1,z2,Ht(v,t));
print(Hti(25+12*I,25-12*I,-100))
print(Htc(25+12*I,25-12*I,-100))
@rudolph-git-acc Well talking about unstable integrals in a slightly different context, here is something which struck me in the morning but now I am confused at not getting the expected answer.
After your comment that fast routines exist for contour integral of log zeta, I have been wondering for the dbn portion, do we even require the full strength of RH verification. If we are creating a barrier from y0 to 1, and if we can prove that at t=0, all the Ht roots i.e the critical strip zeta roots have y<<y0, then shouldn't that be good enough for the barrier?
And this could be done by first counting the zeta roots to height T in the entire critical strip, and then counting them within a region closely hugging the half-line. If the number of roots turn out to be the same in both cases, we know there are no zeta roots beyond that region. No expensive sign change counting would be required!!
If this method is allowed and works as intended, we should be able to make a lot of the conditional bounds unconditional. Contour integrals can be calculated for different T ranges (0,10^9),(10^9,2x10^9),etc. thus easily parallelizing the computation.
So tried,
m4(j)=1+(j-1)%4;
contint(p) = round(abs((1/2/Pi/I)*sum(j=1,4,intnum(v=p[m4(j)],p[m4(j+1)],zeta'(v)/zeta(v)))));
eps = 0.05;
T = 50;
1+contint([1+eps,1+eps+I*T,-eps+I*T,-eps])
contint([0.5+eps,0.5+eps+I*T,0.5-eps+I*T,0.5-eps])
(the 1 in the first formula since the zeta pole at s=1 has to be taken care of) But I am getting the first answer as 10 (which is correct), and the second answer keeps varying depending on eps. Any idea what may be going wrong here? I have a suspicion about the approximate zeta derivative, but haven't found a closed form formula for the derivative.
We could also use the xi'/xi instead of zeta'/zeta if that is more stable.
EDIT: Also one thing I found interesting in the Arb paper you had linked to is that he mentions that adaptive subdivision is used while integrating, which may not be the case with other math packages, so the Arb routines may be better able to handle this.
@km-git-acc
That sounds like a really smart idea!
We know for sure that there is a minimal speed by which the complex zeros are attracted to their conjugates, so they can never 'bend back' into the y-direction. Could this therefore be made even more simple as follows: just verify that at t=0 the rectangle 0+y_0i, 0+1i, X+y_0i, X+1i is zero-free (hence only do one fast contour for that area and proof the answer is zero)?
@rudolph-git-acc Yeah that's even a better idea. By the way, I am noticing that xi is more stable than zeta, although even this is facing some issues as we move closer to the half line.
eps=0.05;
T=50;
Xi(s)=(s/2)*(s-1)*Pi^(-s/2)*gamma(s/2)*zeta(s);
contint2(p) = round(abs((1/2/Pi/I)*sum(j=1,4,intnum(v=p[m4(j)],p[m4(j+1)],Xi'(v)/Xi(v)))));
contint([0.8,0.8+I*T,1+eps+I*T,1+eps]) -> 13728058140034382
contint2([0.8,0.8+I*T,1+eps+I*T,1+eps]) -> 0
contint2([0.7,0.7+I*T,1+eps+I*T,1+eps]) -> 0
contint2([0.6,0.6+I*T,1+eps+I*T,1+eps]) -> 2
@km-git-acc
Great! I will start some ARB-code on this tonight (there is a new ARB release out that has some nice new functions).
Don't see anything wrong with the pari-code, have you tried increasing the default precision? I noticed this was quite important with the err-integral. Also, in the link to Fredrik's code seems to be using a 'closed' form for the log zeta.
@rudolph-git-acc Increasing the default precision does help, although the increase is pretty significant.
\p 100
contint2([0.6,0.6+I*T,1+eps+I*T,1+eps]) -> 1
\p 200
contint2([0.6,0.6+I*T,1+eps+I*T,1+eps]) -> 0
@km-git-acc
Have the following zero counting function for zeta(s) working in ARB and it is indeed very fast. Will try to run a 10^11 count over night.
There are two 'snags' though that prevents us form easily 'morphing' it into a differently sized rectangle. Firstly, it uses the functional equation (see the eps), so that only two sides of the rectangle have to be evaluated by the log zeta. This balances the equation around 1/2 and changing that parameter so that we obtain an y=0.4 (what we need in our world) does produce incorrect results. Secondly, it requires the error term to be sizeable (e.g. 100), which always translates into our Ht-world as an y > 1 (which is useless). Therefore wonder whether a similar formula exists for xi(s) (or better xi(z), esp. since the functional equation for that is even simpler.
I used this function to count the zeros on the critical line and it all matches:
f(t)=imag(lngamma(1/4+I*t/2))/Pi-t/(2*Pi)*log(Pi)+imag(log(zeta(1/2+I*t)))/Pi+1
print(f(100000))
@km-git-acc
Have now done quite a bit of reading on counting zeros of xi(s) and believe we could be running into some problems with the new idea. Here is a picture to illustrate why:
Picture A outlines what the contour needs to be up to a height T and picture B shows the commonly used optimisation based on xi(s) = xi (1-s). One only has to scan through a single 'hook' and the functional equation automatically takes care of the complementary sides thereby speeding up the counting process by a factor 2 (of course the line Re(z)=0.5 also still needs to be counted).
Pictures C and D illustrate what we are aiming to do. Testing B-D=0 is the approach you described and C=0 is the model I added. Unfortunately a serious challenge emerges as soon as a line is placed inside the critical strip: the argument principle only works when the outer edges are free of zeros and in the critical strip we can no longer guarantee that (unlike in model A/B). Also, in figure C we lose the symmetry, so that this model would at least be a factor 2 slower than B. Also model D will be slower than B, since the higher the e, the faster the contour will be processed ((I actually tried it and e=-0.4 gets extremely slow). So, could we still fix model D by for instance using the Riemann-Siegel approximation that the line 0.6 is zero-free? This might work, however it would always remain slower than proving the full RH up to T: The difference is: (model B + scanning line Re(z)=0.5) vs (model B + model D + scanning line Re(z)=0.6). Also, the existence of an effective closed form for counting sign changes at Re(z)=0.5 favours model B unless such an effective formula exists for e.g. Re(z)=0.6.
Does this reasoning make sense? Grateful for your views.
@rudolph-git-acc
I think I understand why A is so reliable and C not so much to compute. A uses a vertical contour far away from the critical strip where the function decays rapidly, and while the horizontal contour traverses into the critical strip, zeta is much more well behaved in the horizontal direction then in the critical strip's vertical direction where it is highly oscillatory, so numerical integrals find it hard to do parts of C.
(http://mathworld.wolfram.com/RiemannZetaFunction.html)
B seems to be a quite optimized way of doing A, which uses several results from theory (which is why the Hardy theta function also comes up), so modifying the B equation for D may require going back how B's equation was derived and then making probably subtle changes.
Suppose we now focus on C or D (but D with the full rectangle contour), I agree both are difficult to compute numerically due to the almost random fluctuations in the vertical direction. A standard way of decreasing the fluctuations without affecting the roots is to use mollifiers, which we have used in various settings. I have been thinking if we use enough mollifier terms with zeta or Xi or H0, can we get something useful.
Xi(s)=(s/2)*(s-1)*Pi^(-s/2)*gamma(s/2)*zeta(s);
H0(z) = (1/8)*Xi((1+I*z)/2);
H01(s) = (1/2)*s*(s-1)*Pi^(-s/2)*sqrt(2*Pi)*exp((s/2-1/2)*log(s/2)-s/2);
B0(z) = (1/8)*H01((1-I*z)/2);
moll(z) = 1/(1-1/2^z)/(1-1/3^z)/(1-1/5^z)/(1-1/7^z)
moll2(z) = 1/(1-1/2^((1+I*z)/2))/(1-1/3^((1+I*z)/2))/(1-1/5^((1+I*z)/2))/(1-1/7^((1+I*z)/2))
ploth(T=0,50,abs(zeta(0.5+I*T)))
ploth(z=0,100,abs(H0(z)/B0(z)))
ploth(T=0,50,abs(zeta(0.5+I*T)/moll(0.5+I*T)))
ploth(z=0,100,abs(H0(z)/B0(z)/moll2(z)))
While the mollifier as used is decreasing the average height of the zeta function, and clearing out the arre anear the zeroes, it is also inducing a lot of micro oscillations, so not sure whether there is a net gain or loss. Ofcourse there are several mollifier papers, so far better mollifiers may be available which do the first well without doing the second.
@km-git-acc
I noticed that suddenly many of the 'stuck' Boinc tasks were handed back yesterday and since then progress is very fast. Was this a conscious action from your side or does this happen automatically? Not sure whether my interpretation is correct, but we might end up with ~4000 'permanently stuck' tasks that won't progress, however which do have been completed in the mean time. Or is something else happening here?
Good analysis of the fluctuation pattern. Wonder how far the contour in A (or B) could be stretched into the horizontal direction to be still having a positive impact on computation. Guess there is a trade-off here since the horizontal edge gets longer and probably needs more time to evaluate (although remains smooth when traveling on one of the horizontal 'iso-curves' through the wrinkled landscape). The mollifiers do seem to work and in the second graph there appears to be a much stronger 'signal to noise' ratio. Although I don't know how a contour integral would perceive this pattern ;-)
After doing quite a few computations today, I start to wonder about the following counterintuitive conjecture: using the argument principle to show that the RH is true up to T (i.e. the full strip is zero-free except the critical line) will computationally always be faster than showing that a smaller region (vertical "lane") of the strip is zero-free...
Another thought: when we calculated the lemma-bound for the area after the Barrier, we only needed to show a line segment was zero-free (due to the bound monotonically increasing with y). The ABC_eff proxy is not designed for t=0, however does seem to continue to work. Is that a possible route? Of course, we could also use the Riemann-Siegel equivalent (that is certainly valid at t=0), but not sure if that is effectively bounded in a similar fashion (i.e. with a decreasing bound on error terms etc) and if so, whether this bound is monotonically increasing with y as well. It would be nice though, since with the N-ranges and the "sawtooth"-mechanism we have already shown to numerically reach up quite high.
@rudolph-git-acc Will have to read more on the RS formulas to see whether they have similar properties to AB or ABC, but I think they should have some similarities atleast. Using ABC instead of actual H0 seems analogous to using RS instead of zeta, so should be an option. But we have to check how big is the error estimate as we move closer to y=0.
You may be right about a narrow vertical lane being slower to evaluate, since that has 2 vertical segments within the critical strip, which seem to be the main bottleneck in the computations, compared to the first alternative which will require only 1 vertical segment in the strip.
In Fredrik's paper he had used an eps of 100, so that the rightward vertical segment is at Re(s)=101, where the zeta function must be extremely damped (maybe it is done to make the answer very reliable). If the horizontal segment has smooth behavior, its length for integration purposes may not be a major concern, as we saw with the ddtbound and ddzbound in the barrier computations, where even X=10^40 or 10^80 computed in the same time as 10^10.
Actually, I checked which indexes in the 9x10e21 computation were still left, and submitted those jobs again, so effectively they have a replication number of 2. Boinc by default suggests a replication of 2 or more for all jobs to prevent such bottlenecks, but we have been using a replication of 1 so that the same thing is not unnecessarily computed more than once. This manual replication only for the remainder tasks, while more work, seems to me a more efficient strategy. Boinc also has a default behavior where once the assigned job is not completed within the deadline (7 days), the same task is reallocated to another machine, so that has also happened with some of the stuck tasks.
For 9x10e21, about 17000 jobs have been completed and 3000 are left, so looking good. For 2x10e20, only 107 jobs out of 2500 are left. For dealing with this residual it may be better we just split it between ourselves on our local machines and combine with the rest of the grid output instead of waiting for the grid jobs to complete. These are the indexes left,
1763,1764,1765,1767,1768,1770,1771,1772,1773,1774,1775,1776,1777,1778,1779,1846,1854,1856,1857,1859,1869,1907,1909,1911,1916,1929,1988,1990,1995,1997,2000,2002,2004,2006,2009,2010,2296,2314,2317,2320,2321,2322,2323,2326,2332,2334,2337,2345,2347,2348,2349,2350,2351,2352,2356,2357,2358,2361,2364,2366,2367,2368,2369,2370,2372,2374,2377,2378,2379,2380,2382,2386,2388,2391,2392,2393,2394,2397,2398,2399,2400,2403,2406,2407,2422,2424,2425,2427,2436,2448,2449,2452,2454,2455,2457,2462,2463,2465,2468,2469,2470,2471,2472,2473,2474,2476,2478
I will begin processing indexes 1763 to 2366.
EDIT: Also, some users have reported that the tasks do not run on processors with avx instruction set, but run successfully on those with the avx2 instruction set. Is this a known dependency of Arb or its precursor libraries?
@km-git-acc
That is great progress indeed. Happy to run the 2366+ numbers, but are you really sure this is faster than using Boinc? As I type this there are only 1428 tasks to send remaining and with the current rate we should be at 0 by tomorrow evening (I currently have a Macbook joined with 4 threads). Then adding the missing 107 would be done in < 2 hours. On the other hand, I think I could try to open multiple terminal windows and run these on the Threadripper-machine in parallel (e.g. 30 of those). Then I also would be done in 2 hours. Let me try that first.
Please find below the checksums I computed for both runs:
x=200000000000000066447, N=3989422804, tay=100:
finmat cell 0: 27.4518696946861646394681138 - 10.1008141160323447449331573*I
finmat cell 9999: -469789140771537030524527.545389 + 724708896010606855.61147069767*I
x=9000000000000000070686, N=26761861742, tay=114:
finmat cell 0: 34.518368582221532709059313 + 6.651233245833691426369034*I
finmat cell 12995: -816211939719584033442733684.304 - 2175865661494337925769.0643676*I
EDIT:
Here are the files for 2x10e20 range 2367 - 2476:
https://drive.google.com/open?id=1IfoX_6pBIfMtRQNeo5REALF-3Dx2riMI
@rudolph-git-acc Great. You were right that the grid jobs are progressing fast, and even without doing anything the list of 107 got reduced to 61 within a day. Combined with your output, the list now comprises just 28 indexes. Running those now.
Have thought further about vertical contour integration for zeta in the strip, but can't think of a good way. Numerical integration either needs a integrand function which can be well approximated by a relatively low order polynomial, or the function should have known regular periodicity which can also be handled. Both these are violated for the zeta or Xi or H0 functions, which have a large number or roots in any decent sized interval hence can't be approximated well by a polynomial, and the gaps between consecutive roots are all over the place, so regular periodicity is out of the picture. The Ht deformation, especially as t becomes large causes the roots to become regularly spaced, but counting the zeroes of Ht won't help in the barrier method, and anyways the t required for reliable calculation at low X values may be larger than 0.5. So it seems we are back to counting sign changes of zeta or H0 or their proxies. If that is the case, then the thread of exploring the negative t region seems more interesting in the immediate term. What do you think?
@km-git-acc
Great, let me know when you have the 2x10e20 stored sums complete and I will kick off the tloop. I think our first priority should be to finalise the 0.11 and 0.10 cases, since it would be really nice to include it in the write-up (i.e. no need for an addendum).
Verifying the RH up to a certain height seems indeed more difficult than expected. Counting the sign changes on the critical line is not a problem up to very high T. I just did T=10e20 (687769382578810314775 zeros and 76106294246675308886088 for 10e22) and it only takes 45 minutes. However, counting zeros through the argument principle along a rectangle is much more problematic. Even a run of Fredrik's algorithm at T=10^10 already takes 2.5 days and still no result (guess that's the reason he didn't publish any numbers higher than 10^9). Of course we could 'thread' or 'grid' the work to obtain some more speed, but I do expect complexity to grow exponetially so that 10^20 is certainly is not within computational reach (although it is very tempting to use Boinc for some more heavy calculations now that we have so many supporting volunteers!).
The more I think about the task of lowering the DBN upper bound, the more I get convinced it won't bring us any closer to the RH (although it brings some good new insights). Assuming the RH is true, all imaginary zeros will have collided before t=0, so if there is anything to be learned about the RH itself, it is more likely to come from new observations in the t < 0 domain. Therefore very keen to see if with the Ht-integral and a rigorous root-finding algorithm, we could replicate the ABB_eff-based 'solidification' patterns of the imaginary zeros at increasingly negative t (the pattern in the real zeros is definitely there!). A provocative conjecture is that the downward facing peaks in the real zeros at negative t, are actually causing the 'jumps' between the different curves on which the imaginary zeros appear to organise themselves at negative t. So, agree that we should refocus on exploring this potentially interesting territory.
There seems to be something very special with the line t=0; i.e. we suddenly find the link to number theory (Euler product), the functional equation, etc., but also the derivative formula (see earlier) breaks down at t=0 and it is the transition point where sqrt(t) goes from real to imaginary. I believe something is happening at t=0 that prevents imaginary zeros passing through it. If we ever find a formula for the suprema of y =sig(x, -t) (we currently only have y=sig(t)...) then I expect it to be of the shape -1/sig(x,-t), i.e. we will get forever closer to t=0 but will never reach it (i.e. we have an infinite number of Lehmer pairs).
@rudolph-git-acc Well, infinite number of Lehmer pairs is quite likely. Prof. Tao had proved it conditional on the GUE hypothesis back when he had come out with the non-negative dbn paper. https://terrytao.wordpress.com/2018/01/20/lehmer-pairs-and-gue/ Also in that light, interesting that you have been able to count sign changes upto 10^22. Is the algorithm rigorous against lehmer phenomena? Also, using Fredrik's algo with the grid setup is a great idea.
About 10 indexes are still left for 2x10e20, which I expect should be completed in the next 2-3 hours. About 800 indexes are left for 9x10e21. Is there a script to add up all the subtotal matrices for the final matrix, or should i process it through the gp route?
Also, will start working in gp on a root localization script in the negative t region. I assume you are doing the same in arb. While I agree that using the current methods and lowering dbn can't get us to RH by itself, I am half-hopeful that a way will be figured out to bound zero velocities, and thus use the dbn results to bound max(sigma) of zeta in the critical strip which is currently known only to be less than 1 (which gives the prime number theorem). max(sigma) is intimately tied to the error magnitude of the Li(x) approximation to the prime counting function. https://en.wikipedia.org/wiki/Riemann_hypothesis#Consequences where x^(1/2) in the error would get replaced by x^(max(sigma)).
@km-git-acc
Boinc progress sounds good. I currently don't have an ARB-script to add up all the individual subtotals. I could develop it in ARB if needed, but that would take me some time to complete. Guess the pari-gp route is faster (but happy to step in if you are too busy).
I have indeed restarted working on the root finding in ARB. Also found a nice trick to obtain the same speed increase by using the similarity of the integral (basically doing two simultaneous calculations of different integrands under the same integral sign like you already did in pari/gp). Will also try to 'thread' the script and see what that brings in further speed increases.
Just for fun tried to verify the backward heat equation and it works out nicely. Also found that the logic can be extended to higher order derivatives where delta(nt)H_t = delta(2nz)H_t. Do you happen to know whether this is a logical consequence of the heat equation? Also wonder whether a similar effect exists when we do integration instead of differentiation.
@rudolph-git-acc Here is the set of 2500 files https://drive.google.com/open?id=1xoFjxsONXZGEtZgXegidR4L96nGeHBlh I will add them in gp later in the day.
Interesting bit on the heat equation. But I will have to read more to understand whether these properties are shared by all functions satisfying such a PDE.
EDIT: Link to the final storedsum matrix near 2x10e20, https://github.com/km-git-acc/dbn_upper_bound/blob/master/output/storedsums/singlemat/storedsum_nolemma_200000000000000066447_dig10.txt I checked sample tloop output and compared it with the abbeff_approx function and the output more or less matches
./tloopsingle 0.1 0.2 0.2 1 storedsum_nolemma_200000000000000066447_dig10.txt
Although the abbeff value at individual mesh points was coming out to be only 5-6 digits accurate, so the tloop script may have to be tweaked a bit to retain close to 10 digits.
Also, these are the 439 indexes left for 9x10e21, and I expect half to 3/4 of this to get completed by tomorrow, but the residual may again get stuck, so we may have to close that manually.
1183,2710,2726,2727,4354,5922,5923,6089,6264,6265,6269,7057,7059,7067,7138,7141,7165,7176,7178,7301,7303,7321,7323,7325,7328,7329,7333,7334,7337,7338,7339,7379,7667,7669,7670,7671,7672,7674,7675,7676,7677,7678,7679,7684,7727,7762,8422,8426,8454,8456,8479,8483,8486,8489,8532,8533,8537,8541,8542,8546,8548,8549,8550,8552,8669,8676,8680,9051,9125,9191,9215,9218,9266,9275,9280,9287,9310,9426,9439,9447,9449,9454,9487,9494,9501,9503,9505,9507,9533,9666,9667,9673,9694,9767,9771,9772,9774,9775,9776,9778,9795,9902,9920,9922,9923,10112,10183,10230,10240,10242,10246,10288,10297,10299,10301,10309,10334,10386,10390,10392,10412,10416,10422,10426,10427,10428,10495,10640,10730,10731,10735,10737,10739,10743,10746,10781,10873,10903,10911,10940,11036,11052,11059,11061,11074,11089,11149,11151,11179,11191,11201,11203,11510,11515,11543,11560,11583,11593,11599,11600,11601,11611,11612,11613,11617,11621,11624,11626,11627,11630,11635,11652,11657,11671,11672,11673,11753,11770,11853,11858,11861,11928,11929,11935,12016,12017,12028,12043,12044,12046,12058,12073,12077,12079,12080,12197,12203,12370,12698,13133,13139,13152,13156,13281,13286,13287,13288,13289,13292,13293,13294,13295,13297,13444,13574,13841,13843,13869,13930,13939,13953,14078,14092,14098,14108,14111,14112,14114,14130,14132,14159,14163,14164,14169,14263,14311,14315,14317,14330,14351,14359,14369,14372,14374,14382,14425,14428,14792,14794,14854,14855,14857,14886,14916,14967,14969,14970,14971,14972,14973,14974,14975,15037,15169,15218,15254,15335,15360,15373,15383,15455,15486,15493,15494,15547,15571,15618,15620,15636,15637,15638,15656,15669,15674,15675,15676,15723,15728,15733,15794,15800,15806,15808,15819,15842,15843,15888,15906,15908,15912,15913,15916,15923,15927,15932,15948,15971,15974,15976,15982,16010,16083,16130,16155,16170,16171,16180,16184,16261,16262,16334,16344,16345,16347,16357,16377,16387,16392,16402,16405,16408,16410,16411,16448,16467,16474,16492,16647,16649,16654,16656,16658,16659,16713,16787,16810,16811,16833,16859,16862,16882,16884,16926,16929,16942,16980,16991,17048,17053,17059,17138,17149,17150,17154,17429,17430,17454,17487,17497,17504,17505,17506,17534,17543,17544,17558,17565,17611,17613,17614,17615,17619,17631,17632,17633,17638,17644,17645,17663,17665,17666,17667,17670,17681,17684,17688,17690,17701,17737,17762,18025,18033,18071,18091,18094,18110,18114,18125,18131,18135,18141,18143,18153,18154,18171,18687,18688,18689,19060,19065,19073,19075,19077,19090,19168,19176,19178,19179,19181,19183,19184,19207,19213,19215,19230,19270,19281,19564,19570
I am thinking that while you run the tloop for 2x10e20, we can run the 9x10e21's tloop through the grid thus parallelizing the efforts.
Proceeding now on root localization in gp.
EDIT2: Script used for subtotal matrix addition
terminal
---------------------------
mkdir gp_subt2x10e20;
cd gp_subt2x10e20;
for i in `seq 0 2499`;
do
cat ../subt2x10e20/subt2x10e20_$i.txt | sed -n '3,102p;103q' | tr '\n' ';' | sed '1s/^/[/' | sed '$s/.$/]/' > gp_subt2x10e20_$i.txt;echo $i' done';
done
-------------------------------------------
gp
--------------------------------------------
\p 20
finalmat = read("gp_subt2x10e20_0.txt");
for(i=1,2499,finalmat = finalmat + read(concat(["gp_subt2x10e20_",i,".txt"]));print(i))
write("finalmatgp_2x10e20.txt",finalmat);
-------------------------------------------
terminal
--------------------------------------------
cat finalmatgp_2x10e20.txt | sed 's/ E/e/g' | sed '1s/^.//' | sed '$s/.$//' | sed 's/; /\n/g' | sed "1i 200000000000000066447.000000000, 100, 100, 10" > storedsum_nolemma_200000000000000066447_dig10.txt
-----------------------------------------------------
@km-git-acc
Great! Very nice to see that the first and last cells 100% match the checksums :)
Have the tloop running now on 64 cores. First few rectangles take ~15 secs each, but these will accelerate when t (slowly) gets higher. Rough estimate on e.t.c.: ~72 hours (I assumed 20k rectangles). So, we definitely should use the grid to complete the 9x10e21. Happy to run a few of the remaining subtotals to get this completed. Would really be great when the checksums match again.
Let me know what sort of tweaks to the tloop you have in mind to bring accuracy back to 10 digits (I still could stop/start the run tomorrow morning). Also, if accuracy is a concern, we could decide to rerun the batch of 2500 with a higher accuracy (there is so much computer power available now, that this would complete in < days).
It seems that the properties of the PDEs listed above are shared by all functions satisfying such a PDE. If we for instance take exp(t)sin(z), I do see the same effect. Also tried so-called 'fractional derivatives' to see whether delta_t H_t (fractional derivative = 1/2) would match delta_z H_t, but it doesn't (it is a factor v different). Not sure whether we could use the observation to make new inferences about the trajectories of complex zeros.
I watched the Bristol presentation from prof Tao again today. He makes the sharp observation that all information about H_t seems to originate from H_0 and that there is not a single bit of information travelling back from H_t to H_0. That is exactly what we see in the derivative formulae as well. Everything stems from xi(s) one way or the other, but hey, maybe there is something surprisingly 'solid' in the 'negative t world' that sheds some light on xi(s)...
@rudolph-git-acc
I think to increase the mesh abbeff accuracy we have to assume that the storedsums output comprises of more than 10 digits values, say 20 digits (which is what i assumed in gp while adding the subtotals) (in many cases the coefficients may get padded with zeroes, although in other cases where cells with large and small values got added, we actually have more than 10 significant digits), so that we will be multiplying/adding 20 digit values in the tloop process.
Tried it with the Pi^2 approximation, where the 3rd output more closely matches the 1st one.
3.1415926535897932384*3.1415926535897932384
9.8696044010893586184
3.141592653000000000*3.1415926530000000000
9.869604397
3.1415926530000000000*3.1415926535897932384
9.869604399
The index list is now down to 332. We could split it into 1183-12058 and 12073-19570. I will start with the latter. Assuming it takes us 2 days, some of our output will indeed become redundant by then.
1183,2710,2726,2727,4354,5922,5923,6089,6265,6269,7059,7067,7138,7165,7684,8454,8456,8669,8676,8680,9051,9191,9215,9218,9266,9275,9280,9287,9310,9426,9447,9449,9454,9487,9494,9501,9503,9505,9533,9666,9667,9673,9694,9767,9772,9775,9776,9778,9922,9923,10112,10183,10230,10240,10242,10246,10297,10299,10301,10309,10412,10416,10422,10426,10427,10428,10495,10640,10730,10731,10735,10737,10739,10743,10781,10911,11052,11059,11061,11074,11149,11179,11203,11510,11560,11583,11593,11599,11600,11601,11611,11612,11613,11617,11624,11626,11630,11635,11652,11657,11672,11673,11753,11770,11853,11858,11861,11928,11929,11935,12016,12017,12028,12043,12046,12058,12073,12077,12079,12080,12197,12698,13133,13139,13152,13156,13286,13287,13288,13289,13292,13293,13294,13295,13297,13444,13574,13841,13843,13930,13939,14078,14092,14098,14108,14111,14112,14114,14130,14132,14159,14163,14164,14169,14263,14311,14315,14330,14351,14359,14369,14372,14374,14382,14425,14428,14792,14794,14854,14855,14857,14886,14916,14967,14969,14971,14972,14973,14974,14975,15037,15169,15254,15360,15373,15383,15455,15486,15493,15494,15547,15571,15618,15620,15636,15637,15638,15656,15669,15674,15675,15676,15728,15733,15794,15808,15819,15842,15843,15888,15906,15908,15912,15913,15916,15923,15927,15932,15948,15971,15974,15976,15982,16010,16083,16130,16155,16170,16171,16180,16184,16262,16334,16345,16347,16357,16377,16387,16392,16402,16405,16408,16410,16411,16448,16467,16647,16649,16654,16656,16658,16659,16713,16810,16833,16859,16862,16882,16884,16926,16929,16942,16991,17048,17053,17138,17149,17150,17154,17429,17497,17504,17505,17506,17543,17544,17558,17611,17613,17615,17619,17631,17632,17633,17638,17644,17645,17665,17666,17667,17670,17681,17684,17688,17690,17701,17737,18033,18071,18091,18110,18114,18125,18135,18141,18143,18153,18154,18171,18687,18689,19060,19065,19073,19075,19077,19090,19168,19176,19178,19179,19181,19183,19184,19207,19213,19215,19230,19270,19281,19564,19570
The heat PDE question now seems worth asking on the blog, since we may get feedback on what properties can or cannot be generalized. It will certainly be cool if a simpler function like exp(t)sin(z) can help reveal more about Ht (and computations involving such functions are much more feasible too).
Exploring the negative t region suddenly feels like the LHC searching for new particles :) Hope we find something like higgs, or hopefully something more unusual.
@km-git-acc
Have started the work on the range 1183-12058 ! Tloop for 2x10e20 is progressing well. Already 5000 rectangles processed and time per rectangle has dropped to 9 secs.
Will study the PDE phenomena a bit further and then ask about it on the blog.
Exploring the negative t region suddenly feels like the LHC searching for new particles :)
Yeah, we are also studying high speed collisions (of complex zeros), so let's hope we find more than the 'Standard Model' :-)
@rudolph-git-acc The remainder list is now 266 indexes long. It may be helpful to exclude indexes not in this list and which you haven't started yet.
1183,4354,6269,7138,7684,8454,8456,8669,8680,9051,9191,9218,9280,9310,9426,9449,9454,9487,9494,9503,9505,9667,9673,9694,9767,9776,9778,10112,10183,10230,10242,10246,10297,10299,10301,10309,10412,10426,10427,10428,10495,10640,10730,10731,10735,10781,10911,11061,11074,11179,11510,11583,11599,11600,11601,11611,11612,11613,11617,11624,11626,11630,11635,11652,11657,11672,11673,11753,11770,11853,11861,11928,11929,11935,12016,12017,12028,12046,12058,12073,12077,12079,12080,12197,12698,13139,13286,13287,13289,13292,13293,13295,13444,13841,14078,14092,14098,14108,14111,14112,14114,14130,14132,14159,14163,14164,14169,14263,14311,14315,14330,14351,14359,14369,14374,14382,14425,14428,14792,14794,14854,14855,14857,14886,14972,14974,15037,15360,15373,15383,15486,15493,15494,15547,15618,15620,15636,15637,15638,15656,15669,15674,15675,15676,15733,15794,15808,15842,15843,15888,15908,15912,15913,15916,15923,15927,15932,15948,15971,15974,15976,15982,16010,16083,16130,16155,16170,16171,16180,16184,16262,16334,16345,16347,16357,16377,16387,16392,16402,16405,16408,16410,16411,16448,16647,16649,16654,16656,16658,16659,16713,16810,16833,16859,16862,16882,16926,16929,16942,16991,17048,17053,17138,17149,17150,17154,17429,17497,17504,17505,17506,17543,17544,17558,17611,17613,17615,17619,17631,17632,17633,17638,17644,17645,17665,17666,17667,17670,17681,17684,17688,17690,17701,17737,18033,18071,18091,18110,18114,18125,18135,18141,18143,18153,18154,18171,18687,18689,19060,19065,19073,19075,19090,19168,19176,19178,19179,19181,19183,19184,19207,19213,19270,19281,19564,19570
@km-git-acc
Nice to see that the list is still getting shorter! I have adjusted the sequence I am working on. Running at a rate of 6 subtotals per hour and expect to be complete tomorrow afternoon.
Tloop for 2x10e20 still running steadily (no zeros found so far ;-) ).
EDIT: I am done with the subtotals till 12058. Happy to run a few more if needed.
@rudolph-git-acc Great. I was able to cover 160 indexes. Combined with some that Boinc additionally gave, only these 24 are left now.
15923
15927
15932
15948
15971
16862
16926
16929
16942
17558
17611
17613
17615
17619
17631
18033
18110
18114
19168
19176
19178
19179
19181
19183
Great if you can take these up.
@km-git-acc
On it! Will also produce the additional graph requested by prod Tao and include these in the write-up.
@rudolph-git-acc Nice, just saw the graph. Also, in the tloop arb code, I just tried out the usual trick of increasing the prec value from 3.3*digits+60 to 3.3*digits+90, and I saw this change (using prt=1)
with +60
0.1, 0.19999999999999999999999999996, 200000000000000066447, 1.9182 - 0.2956*I
with +90
0.1, 0.2, 200000000000000066447, 1.9181569775806 - 0.2955538059430*I
Since the mesh values are more accurate with +90, I am planning to use this modified script for the 9x10e21 tloop.
Also, noticed something strange in the threaded vs non-thread versions of the tloop. The threaded version has some repeated rows. Do check. ./tloop_nonthr 0.1 0.15 0.2 1 storedsum_nolemma_200000000000000066447_dig10.txt ./tloop_thread 0.1 0.15 0.2 1 1 storedsum_nolemma_200000000000000066447_dig10.txt
non-threaded (with +90 prec)
0.1, 0.2, 200000000000000066447, 1.9181569775806 - 0.2955538059430*I
0.1, 0.36, 200000000000000066447, 1.8360600999399 - 0.24107922548513*I
0.1, 0.52, 200000000000000066447, 1.7442294291759 - 0.2019482952726*I
0.1, 0.68, 200000000000000066447, 1.6650608388793 - 0.17110782787750*I
0.1, 0.84, 200000000000000066447, 1.59690857588728 - 0.14639096334581*I
0.1, 1, 200000000000000066447, 1.53786488561289 - 0.1263455823281*I
threaded (with +90)
0.1, 0.2, 200000000000000066447, 1.9181569775806 - 0.2955538059430*I
0.1, 0.2, 200000000000000066447, 1.9181569775806 - 0.2955538059430*I
0.1, 0.2, 200000000000000066447, 1.9181569775806 - 0.2955538059430*I
0.1, 0.2, 200000000000000066447, 1.9181569775806 - 0.2955538059430*I
0.1, 0.2, 200000000000000066447, 1.9181569775806 - 0.2955538059430*I
0.1, 1, 200000000000000066447, 1.53786488561289 - 0.1263455823281*I
@km-git-acc
I will check! Just uploaded a Threaded version V2 with precision at +90 (this is also what I use for the current 2x10e20 run; just passed 18K rectangles). Made some small corrections to the precision of the various print-statements. Most were set at 20 digits and we now exceed this when x is displayed.
P.S. please only use the threaded version for the Tloop. I haven't looked at the previous versions for a while (just put threads=1 and it should work).
@km-git-acc
Good news and bad news. The run 2x10e20 has successfully completed (22314 rectangles), however the problem you discovered earlier turns out to be a serious bug. The deceiving part of this bug is that all rectangles I tested somehow still produced the correct rectangle data, however the details behind it (e.g. for the argument function) where (half) wrong. I believe I have found the problem (that is due to the recent threading extension using variables instead of matrices) and fixed it. A new version V2 has been uploaded. Unfortunately it will require a rerun of the last tloop once again (all the prevous runs were using matrices, so should be fine). Have restarted the 2x10^20 and you can go-ahead with 9x10^21. Sorry for this ugly glitch, but I am really glad you found it before we went to the grid! Probably good to run some further tests at your end.
The remaining subtotals for 9x10e21 are ready and can be found here: https://drive.google.com/open?id=1VlNEU2tLuRHcKokHjk5kR3ehy33Snzwd
EDIT: spotted another bug when threads are > 1 (no issues at thread=1). Need to take deep dive.
P.S. I noted small potential problem when tstart and tend are very close together and there are only a very few rectangles to be processed (I got a system error once). So, best to make the final 't-tranch' not too small.
@km-git-acc
I've finally found the error in the threaded Tloop-script, it was very deep... The rectangle summary output actually wasn't impacted, however the individual mesh points ended up partially mirrored, so the argument principle ended miraculously with a 0 winding number. It is now seems to work properly. Have uploaded the V3-version and started to rerun the 2x10E20 on 64 threads (will rerun the other Tloops after that one is out of the way).
@rudolph-git-acc Great. The latest script seems to be working fine. For boinc, I am anyways using the thread parameter as 1. Have uploaded the 9x10e21 storedsums at https://github.com/km-git-acc/dbn_upper_bound/blob/master/output/storedsums/singlemat/storedsum_nolemma_9000000000000000070686_dig10.txt
The tloop simulation procedure we discussed earlier suggests that around 60000 t-steps will be required. These have been split into about 10000 jobs of 1 hour each with the first few jobs comprising of only 3-4 rectangles!! Have submitted the last 10 jobs to test things out, and if things go well, will submit the rest in a few hours.
Configuration of the last job
bin/boinc2docker_create_work.py --min_quorum 1 --target_nresults 1 dbnupperbound/arbcalc:v10 bash -c "wget http://anthgrid.com/dbnupperbound/storedsums/storedsum_nolemma_9000000000000000070686_dig10.txt -O sss; ./tloop 0.039317 0.093 0.11832 0 1 sss > /root/shared/results/tloop_X_9x10e21_y_0.11832_t_0.039317_0.093.txt;"
EDIT: All jobs submitted.
EDIT2: By the way, do you have the script with Fredrik's algo in parametrized form? As in if it takes just 2 inputs - T_start and T_end, then we can submit those jobs as well and start getting root counts for the 10^9+ region.
EDIT3: Had to cancel all the tloop jobs since they were aborting on the client machines. This time I had started the sequence from t=0 onwards instead of submitting the last jobs first (which had completed successfully), and I am assuming it's a memory issue. The number of mesh points calculated simultaneously near t=0 must be prohibitive. So have now reversed the job submission order and will check at what point they again start failing (probably below t=0.003 for ram 2 gb). t=0 seems to require around 3 gb ram.
@km-git-acc
Did a check on the accuracy of the tloop at 2x10e20 at the +90 setting and all works out nicely at 12-13 digits.
t=0.11679020443607269987, y=0.2, x=200000000000000066447.5,
Tloop detail : 1.75170210599587 + 0.11690269057380*I
(A+B)/B0 eff : 1.7517021059958714893 + 0.11690269057379937826*I
Have started to rerun all Tloops with the corrected script and the +90 setting. Will upload to GitHub once complete. The 2x10e20 run is progressing well on my 64-threads machine and expect it to be complete tomorrow evening.
I will work today on changing Fredrik's script to include T-start and T-end. Have to be careful with the pole at s=1 that should only be counted (subtracted) when T-start = 0. Also have to watch the precision settings for the higher T's, but this should be manageable.
On your 3rd edit, the memory issue must come from the growth of the ests-matrix that contains all the mesh points pre-sorted for the argument principle to be applied. The first rectangle in the 2x10e20 run has a 'num' of 4mln. So, the size of the ests-matrix will be 4x4mln * ~100 bytes = ~1.5 gb. The first 'num' for 9x10e21 at least doubles in size and will push it over the 2gb limit. A while ago I tested some code that instantly calculates the argument principle and avoids the need for the ests matrix. This will however only work in a non-threaded situation (each thread calculates a 'chunk' the mesh points and one has to wait until all are completed before the argument-function could be applied to the end and beginning 'welding' points for each thread). Also concerned about the detailed printing that would not longer have the right sequence. Of course the latter could be fixed by only allocating/filling the ests matrix with the prt-switch is set to 1. For Boinc this could work, since we always use a single thread. As a fallback for the memory issue, I could try to run the first t-batch on my 64 thread (32 Gb) machine once it has completed the 2x10e20 run. Maybe this just pushes us below the memory threshold.
P.S. would it possible to allocate 'memory-hungry chunks' to higher memory machines?
@rudolph-git-acc I have submitted 10 jobs as a test with the memory requirement set at 3.5 GB instead of the default 1.8 GB. Ideally the server should send these to only those who have allocated that or higher memory to their Vbox. If these keep failing, then indeed the local machine will save the day :)
Nice catch on the pole exclusion, had forgotten about that. Before using it large scale, I think we should anyways test it out on very smaller intervals eg. [10^9,10^9+5] where we know apriori there are only a specific number of roots. This link can help (it stores all the zeros calculated in the Platt paper). http://www.lmfdb.org/zeros/zeta/?limit=100&t=1000000000
@km-git-acc
Have tried to adjust the root counter for segments of T, but there is a snag (as always) in this formula:
In the lower limit of the left integral, I could simply inject T_start, however what to do with the theta(T)?
Also tried to just walk around the full rectangle and drop the theta function, but then I run into issues when I try to contour integrate the horizontal line segments. For some reason it always comes back with infinity.
I believe the secret is somewhere in simplifying this formula:
but then the Im-sign in front of the integrals spoils things. I played with some simplifications of the integrals and even got the right results, but with timings that are slower for Te-Ts=1 then running from 0...Te entirely. Do you see the trick (it's the integrals that slow down things, the Hardy theta is fast enough)?
@km-git-acc
Was re-running the tloop script for 10e16 and it ended in a kill 9 error (typically memory related). Spotted a memory leak that will only manifest itself at high x. The root cause was found in a few variables not being properly cleared. This actually could be related to the problems you experienced on Boinc, although the ests-matrix certainly plays a role as well. Have uploaded a fixed version that has stable memory.
@rudolph-git-acc Opening the new thread.. Agree dbn < 0.1 is a good place to stop. I think the tloop part could potentially also be passed through Boinc. Will try out on a smaller X to see how it goes (essentially apart from t1,t2,y parameters (appropriately chosen for each job) we would also pass an additional parameter to the boinc job - the web address of the relevant storedsums file).
I will check if there are readymade scripts for RH verification to help in detailed understanding of the process involved. The one at https://www.wstein.org/simuw/misc/zeta_grid.html seems to be a dead link. Is Gourdon-Demichel's code available?