ukwa / ukwa-pywb

GNU General Public License v3.0
11 stars 3 forks source link

Instances showing 502s #94

Closed crarugal closed 1 year ago

crarugal commented 1 year ago

This relates to www.mind.org.uk captures: https://www.webarchive.org.uk/act/wayback/archive/*/http://www.mind.org.uk/

anjackson commented 1 year ago

Okay, so this was quite nasty.

I looked up the most recent copy in the CDX index, and grabbed the WARC records from the WARC Server (this should probably be a helper Juypter notebook, as it's pretty straightforward calls to internal APIs!).

CDX: Visit http://cdx.api.wa.bl.uk/data-heritrix?url=https%3A%2F%2Fwww.mind.org.uk&sort=reverse&limit=1

uk,org,mind)/ 20220721105603 https://www.mind.org.uk/ text/html 200 JVFSHQYYUHI3TW77S6ALQK4ZGEDL3Z2O - - 23663 55649806 /heritrix/output/frequent-npld/20220606093552/warcs/BL-NPLD-20220721103716543-20118-80~npld-heritrix3-worker-1~8443.warc.gz

WARC: Take filename and offset/length (55649806/23663). Convert to range start-end (55649806 to 55649806+23663-1, i.e. 55649806-55673468. Run a range request, like this:

curl -r 55649806-55673468 http://warc-server.api.wa.bl.uk/webhdfs/v1/by-filename/BL-NPLD-20220721103716543-20118-80~npld-heritrix3-worker-1~8443.warc.gz | gunzip - | head -50
WARC/1.0
WARC-Type: response
WARC-Target-URI: https://www.mind.org.uk/
WARC-Date: 2022-07-21T10:56:03Z
WARC-Payload-Digest: sha1:JVFSHQYYUHI3TW77S6ALQK4ZGEDL3Z2O
WARC-IP-Address: 104.16.26.83
WARC-Record-ID: <urn:uuid:97078652-9bf1-4a1e-8358-5c974b6d72fd>
Content-Type: application/http; msgtype=response
Content-Length: 92761

HTTP/1.1 200 OK
Date: Thu, 21 Jul 2022 10:56:03 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Access-Control-Expose-Headers: Request-Context
...

That showed that we do have proper 200 content for that URL. It also indicated that there was an extremely large cookie in the original response.

Set-Cookie: personalisationGroupsNumberOfVisits="5863,5862,5862,11421,12761,1082,11343,11198,11725,24127,12003,14649,17178,20871,22039,22040,22041,22341,22342,22343,19220,20716,30600,30603,30604,30605,30606,30609,30610,14505,8519,17210,30550,12611,11719,30612,30615,30618,4112,11714,11717,11282,11283,12147,12150,28461,11746,11190,11233,23773,25407,11347,12584,12620,24411,12865,10928,30641,30642,28478,30637,12588,6931,30645,30645,11969,16229,19238,30620,11984,30644,30646,30623,30622,17559,6783,22250,30654,28285,28293,13605,14164,20459,14163,30648,30649,30650,30651,15666,15768,6872,30659,30571,30636,18094,12265,12385,11704,24684,21948,12997,30669,20525,30667,30668,30672,11415,16261,17684,25254,25256,25256,30621,7493,18069,11280,11759,13652,25351,30676,30677,30678,30624,30625,30626,21514,19382,16281,16280,16568,11261,22011,12148,11712,24060,30555,30680,30681,22109,21601,7582,7710,21952,21970,21972,12286,28122,30695,30684,30685,30686,30687,30688,30689,30632,15828,10745,15668,6782,8420,7009,21122,30729,30694,30708,30709,30710,30727,30728,30730,18932,17671,17672,11194,11200,11398,11973,11978,11979,11980,11981,11982,12171,12170,12172,12173,12174,12176,12177,12178,12179,12180,12182,15957,15968,28193,30731,30732,30733,30734,30735,30736,30744,6932,28436,30590,30750,30753,16236,8447,15640,28464,21902,21906,25034,30758,30536,30537,11748,11320,30781,11202,11205,30717,30718,30724,30719,30720,19283,22235,22359,28260,30582,30756,15825,15819,16569,18516,18518,11279,11281,11284,30822,30823,30825,30826,30828,30829,30830,30824,11412,11413,11414,11416,11417,11418,11419,30856,30849,30850,30851,30852,30872,30876,30878,16251,30745,30879,30885,30739,30742,30743,30538,30755,30821,30835,30883,30886,30887,14656,30930,30931,7611,30937,20736,23859,25428,25560,28082,30935,30950,30953,30954,30957,30958,30959,15679,22628,22628,22651,22651,16230,30977,18101,18102,18104,17594,17595,14659,14650,19478,20889,20945,22689,30978,30981,18289,12509,15696,17541,19056,7515,7515,25251,30873,7923,17179,19179,16013,19370,23927,30986,31001,20539,23700,24388,24500,24709,24929,25057,28509,30543,30979,30992,28377,30939,31002,24128,20892,15661,15661,19172,11185,11382,12127,13045,13047,30969,30991,31012,31013,31019,31021,31022,31023,31024,22107,14992,11383,12254,14651,21867,30881,31027,11236,11384,25755,11331,11385,11386,11387,31046,31035,31036,31037,31039,31041,31051,31053,31054,20606,17533,20453,20453,31045,15791,15794,15802,15810,14801,8638,8640,8641,8644,8646,8653,8657,8660,8661,8662,8664,8666,8667,8669,8682,8683,8685,8691,8698,8699,8706,8707,8709,8710,8717,8719,8723,8738,8744,8745,8747,8749,8752,8764,8773,8776,8782,8783,8785,8788,8793,8794,8795,8797,8804,8807,8809,8810,8812,8813,8814,8815,8817,8818,8820,8826,8827,8829,8864,8865,8958,8960,8966,8969,9532,9537,9538,9542,9546,9549,9554,9738,9802,9804,9834,9837,9840,9845,9849,9850,9851,9981,9994,10009,10035,10038,10042,10062,10092,10095,10098,10102,10110,10113,10175,10192,10197,10219,10223,10226,10253,10262,10272,10283,10314,10319,10323,10361,10369,10374,10398,10454,10471,10473,10494,10504,10510,10515,10519,10529,10653,10667,10700,10764,10808,10823,10827,10839,10859,10915,10975,11023,11052,11058,11087,11098,11143,11442,11444,11452,11475,11501,11528,11533,11540,11545,11563,11618,11621,11637,11639,11644,11682,12650,12654,12656,12665,12676,12682,12684,13064,13069,13076,13102,13127,13130,13138,13252,13276,13494,13730,13734,13740,13746,13754,13757,13796,13800,13803,13804,13810,14064,14069,14071,14075,14086,14092,14097,14101,14104,14158,14160,14162,14166,14170,14181,14188,14190,14192,14199,14201,14203,14205,14207,14257,14259,14264,14266,14269,14282,14285,14286,14288,14302,14353,14393,14400,14418,14425,31062,31063,31065,8861,8642,8521,7645,7430,7181,6994,6992,6991,16978,17054,17055,17056,17058,21922,21922,12614,7258,15550,12278,12512,11736,15244,8483,15662,7683,11342,15261,31028,8068,31052,28191,31066,31074,11329,11330,11332,11333,11334,11335,11336,11337,11338,11339,11340,30521,31075,31076,31077,31078,31080,31082,31086,31087,31092,31095,17593,21107,11708,24406,24415,11709,16607,21202,31097,31088,8021,31079,31085,11223,11242,11301,11358,11393,15541,15541,15543,15542,15544,15545,15546,15547,15548,17063,15812,11603,28262,28262,31110,31112,31113,31114,31115,31116,31117,11260,11260,22065,18106,20590,20048,14661,14652,14657,14653,14654,16677,18281,19208,19879,14742,15290,15291,11285,16233,11697,11699,11971,11711,11293,23895,20537,25442,31103,11974,17596,6808,17162,22701,23654,24718,24749,25310,31125,31128,31151,17685,13041,12506,28345,5467,20317,31126,8042,31179,31235,24100,24700,31162,31171,31174,31186,31187,31194,31192,17602,5356,7761,8299,21364,21532,19837,20481,20483,25371,25587,25618,27786,5419,7504,7723,7978,7734,7734,7917,8774,11213,11968,12825,20146,20348,21373,20152,20364,23449,25358,25362,7990,27808,27812,27812,27813,28048,27814,27803,27815,28057,27804,27994,28065,5449,7656,8076,8372,14059,15305,15305,15404,15488,15622,16277,16675,16887,16887,16888,16888,21883,21883,21884,21884,22156,22156,22157,22157,22249,23531,23707,23707,23708,24033,24034,18502,18514,18515,18517,18519,18520,7937,8616,17643,19216,22243,22291,7970,31081,12572,8632,8633,8635,8636,8637,8639,8647,8650,8654,8659,8663,8665,8671,8673,8684,8686,8687,8689,8690,8692,8701,8702,8704,8705,8708,8711,8712,8714,8716,8718,8721,8724,8725,8726,8727,8728,8729,8730,8737,8739,8740,8741,8751,8753,8765,8766,8771,8775,8778,8779,8792,8800,8802,8803,8816,8821,8823,8825,8828,8831,8833,8862,8863,8962,8963,8965,8967,9533,9535,9541,9544,9548,9555,9558,9559,9562,9736,9737,9795,9798,9805,9829,9830,9832,9835,9846,9984,9985,9987,9988,9990,9992,9993,9996,9998,9999,10001,10004,10006,10011,10013,10015,10016,10021,10022,10032,10034,10040,10041,10048,10052,10054,10058,10059,10063,10066,10069,10070,10073,10076,10077,10078,10082,10084,10087,10096,10101,10104,10105,10106,10108,10109,10111,10115,10117,10150,10151,10158,10159,10161,10163,10170,10173,10178,10180,10181,10184,10203,10208,10210,10214,10230,10233,10235,10239,10251,10254,10258,10260,10263,10264,10267,10270,10271,10273,10275,10277,10280,10291,10305,10316,10321,10324,10328,10332,10336,10338,10339,10342,10345,10348,10351,10352,10354,10358,10360,10363,10366,10370,10376,10379,10381,10384,10387,10388,10390,10392,10402,10407,10408,10416,10417,10420,10424,10426,10428,10429,10433,10437,10438,10440,10445,10449,10453,10458,10463,10465,10479,10480,10482,10483,10486,10487,10490,10496,10500,10507,10518,10525,10528,10535,10543,10630,10636,10638,10640,10645,10647,10651,10656,10657,10661,10662,10663,10665,10669,10671,10675,10676,10678,10680,10682,10687,10690,10693,10702,10705,10738,10742,10744,10772,10774,10777,10779,10785,10786,10790,10794,10796,10798,10799,10809,10811,10813,10817,10819,10820,10826,10830,10832,10834,10836,10849,10894,10898,10899,10901,10905,10913,10917,10918,10924,10926,10933,10934,10936,10942,10945,10947,10948,10958,10977,10979,10983,10985,10987,10995,10997,10999,11004,11005,11011,11026,11033,11038,11062,11064,11072,11077,11090,11092,11094,11159,11436,11439,11448,11450,11454,11457,11458,11466,11467,11470,11477,11479,11480,11482,11497,11498,11505,11506,11509,11511,11520,11536,11537,11542,11549,11551,11553,11566,11569,11571,11572,11574,11577,11578,11579,11582,11584,11586,11589,11590,11592,11594,11596,11598,11600,11604,11609,11613,11619,11623,11624,11627,11631,11633,11634,11646,11648,11660,11662,11664,11668,11670,11671,11676,11678,11689,11692,11695,12209,12216,12625,12630,12640,12644,12647,12652,12659,12664,12667,12669,12673,12680,12696,12698,12701,12704,12708,12709,12712,12833,12834,12839,12841,12843,12861,13051,13055,13056,13059,13071,13072,13092,13111,13113,13115,13124,13125,13142,13143,13145,13274,13279,13386,13497,13658,13660,13661,13664,13665,13667,13669,13670,13673,13676,13678,13681,13684,13686,13690,13692,13694,13696,13698,13700,13702,13704,13706,13708,13710,13714,13717,13719,13723,13727,13729,13732,13742,13745,13752,13759,13798,13808,13815,13823,13833,13845,13923,14062,14067,14073,14088,14090,14099,14103,14106,14108,14116,14118,14120,14122,14124,14126,14128,14130,14132,14134,14136,14138,14140,14142,14145,14146,14149,14150,14152,14155,14156,14168,14171,14172,14177,14179,14186,14194,14196,14210,14212,14214,14217,14219,14255,14261,14263,14268,14294,14298,14305,14310,14333,14337,14344,14350,14357,14361,14363,14396,14402,14405,14407,14412,14414,14429,14431,14433,14435,14437,14439,14441,14453,14455,14457,14461,14467,14470,14494,14495,14508,14509,14510,14521,14536,14575,14766,14767,14796,14868,14870,14946,14949,14951,14953,14983,15224,15241,15242,15243,15250,15417,15418,15424,15443,15487,15490,15602,15709,15820,15907,15932,15939,15951,16004,16125,16499,16540,16550,16551,16553,16556,16557,16562,16637,16674,16676,16708,16737,16738,16740,16741,16742,16764,16769,16804,16837,16841,16842,16884,16885,16886,17046,17047,17072,17121,17186,17209,17346,17472,17532,17780,17848,17914,18290,18406,18825,19022,19023,19031,19132,19134,19144,19198,19252,19253,19314,19356,19372,19375,19392,19405,19440,19457,19541,19542,19588,19611,19614,19674,19683,19774,19797,19835,19842,19892,19909,19929,19958,20084,20089,20148,20259,20276,20278,20342,20347,20380,20447,20468,20497,20529,20538,20570,20634,20642,20676,15656,20830,20853,20857,20919,20948,21118,21121,21165,21179,21204,21243,21308,21343,21439,21452,21458,21462,21481,21504,21512,21639,21691,21738,21863,21872,21877,21915,21920,21987,22070,22113,22593,22614,23307,23599,23606,23620,23693,23727,23795,23860,23933,24041,24102,24125,24130,24173,24201,24207,24238,24248,24291,24337,24410,24502,24538,24554,24556,24717,24809,24930,24969,24975,24981,25002,25105,25112,25149,25201,25244,25261,25430,25472,25475,25671,25743,25756,27784,28308,28309,28323,30546,30565,10247,10045,10044,10024,9995,9977,9954,9949,9823,9820,9724,9212,8970,8763,8735,8697,8618,8618,8609,8608,8607,8606,8605,8604,8603,8601,8600,8599,8598,8597,8596,8595,8594,8593,8592,8591,8590,8589,8588,8587,8586,8585,8584,8583,8580,8579,8578,8577,8576,8575,8572,8569,8568,8565,8564,8563,8562,8561,8559,8558,8557,8556,8555,8554,8553,8552,8551,8550,8549,8548,8547,8546,8545,8544,8543,8542,8541,8540,8536,8533,8531,8530,8527,8526,8525,8524,8523,8522,8518,8517,8516,8515,8514,8513,8512,8511,8510,8509,8508,8507,8504,8501,8499,8497,8496,8495,8494,8493,8492,8491,8490,8489,8487,8486,8485,8484,8482,8481,8478,8477,8476,8475,8474,8472,8471,8470,8469,8468,8467,8466,8465,8464,8463,8462,8461,8460,8459,8458,8457,8455,8454,8453,8452,8451,8450,8449,8448,8442,8441,8440,8439,8436,8435,8434,8433,8432,8431,8430,8429,8428,8427,8425,8424,8422,8421,8419,8417,8416,8409,8408,8400,8399,8398,8367,8365,8364,8363,8362,8361,8360,8359,8357,8334,8333,8332,8331,8330,8324,8323,8322,8320,8192,8191,8190,8160,8154,8150,8149,8147,8146,8145,8144,8142,8141,8140,8139,8055,8054,8053,8051,8050,8049,8046,8039,8038,7976,7975,7973,7967,7966,7964,7963,7961,7936,7934,7933,7932,7924,7922,7921,7918,7916,7915,7914,7913,7911,7910,7907,7906,7715,7709,7708,7671,7670,7669,7668,7667,7666,7665,7664,7662,7661,7653,7639,7626,7625,7624,7623,7622,7621,7620,7619,7618,7615,7614,7613,7612,7610,7606,7605,7603,7600,7597,7591,7584,7581,7580,7579,7578,7577,7576,7575,7574,7573,7572,7571,7570,7569,7568,7567,7566,7565,7564,7563,7560,7559,7558,7554,7553,7552,7549,7548,7547,7543,7541,7539,7538,7535,7530,7528,7527,7526,7525,7524,7523,7522,7520,7517,7516,7513,7502,7500,7499,7496,7494,7492,7487,7486,7485,7484,7468,7466,7464,7461,7458,7454,7450,7448,7447,7446,7445,7444,7443,7442,7440,7439,7438,7437,7436,7434,7433,7432,7429,7428,7427,7426,7425,7424,7423,7422,7420,7419,7418,7417,7416,7415,7414,7413,7412,7411,7410,7409,7408,7407,7406,7405,7404,7403,7402,7401,7400,7395,7394,7393,7392,7391,7390,7389,7388,7387,7386,7385,7384,7383,7382,7381,7380,7378,7377,7376,7375,7357,7354,7353,7351,7350,7349,7348,7347,7346,7345,7344,7343,7342,7341,7340,7339,7338,7337,7336,7335,7334,7333,7331,7330,7329,7309,7308,7307,7306,7305,7304,7303,7302,7301,7300,7299,7298,7297,7274,7271,7270,7269,7268,7267,7265,7264,7261,7260,7259,7257,7255,7254,7253,7252,7251,7250,7249,7242,7241,7240,7238,7237,7236,7235,7234,7233,7228,7227,7226,7225,7224,7223,7221,7219,7216,7211,7209,7208,7207,7206,7205,7204,7203,7202,7200,7199,7198,7197,7194,7193,7192,7191,7190,7189,7187,7186,7185,7184,7183,7182,7178,7177,7174,7173,7172,7171,7164,7163,7162,7161,7160,7159,7158,7157,7156,7155,7148,7147,7146,7145,7144,7143,7142,7141,7135,7133,7132,7053,7052,7051,7050,7049,7048,7047,7046,7045,7044,7043,7042,7041,7040,7039,7038,7035,7034,7033,7032,7031,7030,7028,7027,7026,7025,7024,7023,7022,7021,7020,7019,7018,7017,7016,7015,7013,7008,7007,7006,7005,7004,7003,7002,7001,7000,6999,6998,6996,6995,6993,10266,10278,10404,10502,10506,10683,10684,10685,10783,10887,10889,10950,10986,11151,11512,11602,12166,12255,12624,12786,12794,12829,12835,13103,13109,13110,13128,13522,13547,14068,14076,14182,14254,14423,14523,14526,14534,14538,14642,14692,14747,14815,14914,14981,15419,15515,15516,15694,15778,16838,17032,17194,17657,17699,18297,18537,18998,19083,19149,19243,19246,19433,19439,19442,19549,19666,19762,19901,19937,19940,20087,20318,20444,20464,20629,20630,20707,20723,20822,20823,20826,20953,20971,21100,21102,21111,21113,21309,21466,21633,21735,21742,21870,21933,21951,21978,21979,21980,22012,22069,22074,22128,22146,22147,22162,22168,22612,23173,23559,23616,23656,23709,24025,24026,24040,24065,24115,24137,24217,24254,24314,24404,24607,24615,24708,24710,24752,24902,24998,25007,25060,25225,25243,25321,25411,25467,25537,25584,25602,25711,25714,25762,28058,28059,28137,28140,28198,28199,28344,28484,28485,28493,30492,30526,30589,19890,19891,19911,20307,20507,20510,20511,20514,20515,20624,20627,20648,20715,21289,21313,21328,21372,21442,21467,21470,21549,21551,21574,21603,21678,21707,21719,21741,21746,21813,21873,21874,21895,21899,21900,21928,21929,21932,21947,21991,22004,22017,22020,22021,22022,22023,22133,22134,22210,22211,22214,22215,22217,22218,22219,22220,22221,22225,22226,22228,22229,22233,22238,22241,22299,22354,22355,22356,22357,22358,22360,22590,22594,22617,22618,23459,23462,23463,23534,23535,23545,23605,23623,23625,23627,23635,23639,23640,23643,23651,23653,23663,23670,23679,23680,23682,23683,23688,23689,23697,23698,23699,23704,23760,23775,23816,23846,23854,23861,23862,23873,23904,23905,23938,23943,23944,23945,23951,23955,23980,23986,23999,24000,24014,24015,24024,24054,24068,24069,24070,24076,24080,24084,24086,24088,24089,24116,24121,24147,24148,24150,24151,24152,24153,24161,24163,24166,24168,24174,24178,24185,24187,24190,24191,24192,24206,24223,24235,24236,24241,24242,24243,24244,24246,24259,24293,24294,24295,24320,24335,24336,24341,24343,24351,24352,24354,24394,24395,24396,24398,24399,24400,24401,24402,24403,24414,24421,24431,24433,24434,24435,24436,24440,24499,24512,24529,24531,24533,24542,24546,24577,24592,24593,24594,24595,24597,24598,24690,24692,24697,24698,24701,24702,24712,24714,24716,24721,24722,24723,24724,24733,24735,24737,24740,24746,24748,24788,24792,24825,24888,24889,24897,24899,24900,24904,24908,24923,24928,24971,24973,24978,24980,24982,24985,24986,24989,24990,25006,25009,25022,25023,25026,25029,25030,25031,25032,25035,25053,25054,25058,25068,25070,25094,25098,25099,25100,25114,25141,25152,25153,25154,25158,25159,25160,25161,25174,25178,25180,25181,25191,25200,25221,25222,25223,25224,25234,25237,25238,25240,25257,25274,25282,25315,25359,25360,25372,25378,25391,25393,25410,25413,25415,25440,25441,25458,25459,25468,25469,25473,25498,25506,25539,25561,25576,25582,25606,25607,25636,25682,25683,25697,25698,25699,25706,25707,25708,25709,25710,25719,25721,25744,25760,25761,27783,27785,27787,28036,28037,28038,28041,28074,28079,28080,28081,28083,28084,28098,28111,28128,28149,28156,28183,28184,28192,28194,28201,28225,28227,28229,28231,28247,28255,28256,28257,28265,28266,28268,28294,28298,28302,28316,28317,28318,28319,28320,28321,28324,28325,28326,28327,28328,28329,28330,28383,28432,28437,28444,28453,28465,28476,28477,28513,28515,28516,28517,28518,28519,28520,30481,30482,30484,30485,30486,30488,30489,30496,30498,30501,30507,30520,30525,30539,30540,30569,30570,30572,30577,30583,30593,11287,11764,12609,25192,20604,20530,20951,16916,18839,20964,17037,17062,15262,15263,15264,15265,15266,15267,31508,31523,31532,31541,31541,20610,16501,31034,16449,31502,31503,31504,31505,31506,31507,31514,31515,16855,11317,16481,16480,16482,16483,16484,16485,16486,16487,16462,16462,16464,16465,16466,16467,16468,16469,15809,16438,6784,7220,6807,24800,31564,31598,31559,31049,31520,31522,31553,31555,31556,31557,31560,31561,31562,31565,31566,31567,31568,31572,31584,31587,15432,16117,15649,31574,31607,31614,31601,31602,31193,31575,31617,31618,31620,31630,31631,31626,31133,31067,31577,31621,11762,15651,7699,17682,31600,31629,31659,22662,31656,31656,17619,17619,31552,17127,15789,15803,31610,7716,14799,14533,16099,15657,15658,15754,15755,15761,17057,16419,16420,16421,16422,16424,16100,11288,11234,31683,12618,31665,31668,31669,19692,11394,17991,17996,17992,17993,31676,25101,25109,25113,25117,31725,31727,31729,31734,17159,31563,31660,31664,31724,31731,31732,31733,31736,31737,31738,31741,31742,31743,31744,31056,31709,31721,31682,24992,7495,31776,12160,31761,31763,31775,16311,16319,16358,31740,31801,31830,31770,31772,31688,31693,31706,23580,24810,25059,25501,25504,25603,28131,31814,31815,31816,31819,31834,31844,31856,31857,31858,22025,12125,31837,31867,31871,31873,31876,31825,31832,31833,20862,31874,31875,31880,31881,31885,31886,31887,31888,25512,15805,20138,28346,28362,28365,28370,28372,28386,28393,28398,28404,28408,28411,28419,28424,28426,28259,28261,28263,28272,31509,11408,31903,31904,17148,17150,17151,20334,21335,17167,17168,17169,17170,17171,17172,17173,31818,21644,24095,17644,17918,18635,19057,19057,23729,23729,31869,31870,23776,24101,31517,31905,31910,31912,31919,31920,31921,31923,11191,11367,18336,19098,17147,17166,17152,24743,20332,24753,15804,21420,19598,19781,20154,31925,31927,31929,31945,31947,31948,31949,31950,31957,31962,31963,31963,31964,31965,31966,31967,31968,31969,31970,31983,31971,31978,32002,31902,31936,31937,32004,31981,23796,23796,15767,12901,6871,31942,32020,32021,16500,12130,32046,32003,32030,32032,32034,32037,32040,32041,32042,32047,32050,32051,32052,32054,15406,15665,32024,32027,32063,32065,19165,23798,11176,12256,11713,32026,32029,32031,32033,32035,32057,32078,32080,32081,12889,12914,12915,12916,12917,12918,12920,13080,32090,32039,32094,32095,31999,32005,32059,12606,32067,32068,32069,32071,32072,32073,32074,15269,15269,15270,15271,15272,15273,15274,15673,12600,24557,28491,32175,32176,32177,32178,32179,12124,31951,32342,32343,32180,32181,32182,32350,32351,17185,11707,17331,7696,13016,13018,32058,16443,31930,31931,31932,31933,31934,32353,32354,32355,32356,30633,13096,32357,32384,32385,32386,7700,32395,32383,32387,11349,32473,32425,32426,16435,32427,32440,19199,19199,17751,23617,23617,11230,11178,11179,11181,11186,11187,11250,11252,11255,11254,11257,32007,32525,11313,20601,20596,20597,20598,20599,20600,20602,12760,12760,32496,12161,32480,16231,16235,21624,17929,31014,32410,11235,11237,11239,11361,16423,19306,32542,32544,22232,25233,32559,32560,32561,32562,32563,32564,32568,32569,7698,32399,32430,32481,32401,32402,32441,32447,7704,32406,32408,32572,32573,25246,15775,15635,9623,32434,32435,32436,32552,32575,32576,32577,32533,32590,32591,32605,32606,32607,32608,32609,11375,17725,17748,17750,22086,22112,21911,21912,22213,22234,22297,23585,23958,23971,24082,24120,24149,24164,24169,24170,24181,24183,24184,24193,24198,24199,24200,24237,24239,24258,24322,24324,24325,24331,24340,24342,24345,24346,24347,24348,24397,24439,24490,24491,24492,24494,24497,24516,24517,24518,24519,24521,24522,24523,24524,24525,24526,24535,24585,24599,24605,24682,24695,24704,24711,24713,24715,24756,24778,24793,24797,24801,24803,24807,24813,24866,24874,24924,24977,24983,24988,24991,24993,24995,24999,25033,25090,25102,25116,25131,25151,25170,25179,25182,25185,25190,25202,25203,25204,25218,25219,25242,25263,25264,25265,25311,25316,25337,25349,25361,25388,25417,25424,25425,25427,25429,25453,25481,25502,25503,25507,25508,25509,25532,25535,25540,25574,25600,25601,25609,25674,25686,25691,25712,25715,25716,25722,25763,27802,28042,28091,28097,28100,28102,28109,28110,28112,28126,28148,28150,28155,28187,28190,28232,28245,28271,28275,28278,28295,28297,28305,28492,30483,30522,30523,30533,30562,30566,30567,30568,30586,30587,30588,31168,31500,31657,31802,31826,31835,31868,31914,32091,32611,32614,32617,32618,32619,17163,17164,17203,20722,31845,32089,12009,12010,12011,12929,12900,13000,13002,13003,12904,13019,13020,13021,12907,13042,13043,13044,23878,32620,32622,32623,32624,32625,32626,32629,32630,32631,32632,32633,12258,12305,17165,18835,18836,32621,32628,32635,15772,31820,32360,15785,15787,6786,7212,17071,32638,32640,7215,7314,16433,16606,32644,8128,32643,7705,11401,11238,12589,12597,12598,12599,32677"; expires=Wed, 19-Oct-2022 10:56:03 GMT; path=/; secure; HttpOnly

So, the suspicion fell on such large headers breaking some buffers or limits. But where? These requests go through a few proxies....

The 'raw' service is available at port 7171 on the prod1 server, where we can talked directly to the uwsgi server that is running pywb. Running the request against that service showed an OSError: write error, and the server was returning an empty response.

Running ukwa-pywb without uwsgi seemed to work fine.

cd github/ukwa-pywb
source venv/bin/activate
export UKWA_INDEX=cdx+http://cdx.api.wa.bl.uk/data-heritrix
export UKWA_ARCHIVE=http://warc-server.api.wa.bl.uk/webhdfs/v1/by-filename/
ukwa_pywb -p 8089
...

After quite a lot of guesswork and experimentation, where just upping the overall buffer size wasn't helping, I discovered an (AFAICT) undocumented configuration parameter response_header_limit. The default value for this parameter seemed to be low enough to balk at that long header, because when the service was reconfigured with response-headers-limit = 262144 it started working.

However, then the layers of proxies still were thowing errors. In this case, it seems the initial proxy needs:

            uwsgi_buffers 16 256k;
            uwsgi_buffer_size 256k;
            uwsgi_busy_buffers_size 512k;

and the front-end NGINX proxy needed to match this (128k was not enough!):

        proxy_buffer_size  256k;
        proxy_buffers   4 256k;
        proxy_busy_buffers_size   256k;

So these same changes need rolling out, along with a new ukwa/ukwa-pywb:2.6.7.2 that contains the improved uwsgi configuration.

This was such a nightmare that I thing we should open an issue in pywb to see if very large headers should be handled differently, e.g. dropped.

anjackson commented 1 year ago

Note that, having rolled out these changes on the DEV system, this now works: https://dev.webarchive.org.uk/wayback/archive/20220720112219/https://www.mind.org.uk/

But at the time of writing, prod is broken: https://www.webarchive.org.uk/act/wayback/archive/20220720112219/https:/www.mind.org.uk/

crarugal commented 1 year ago

Thanks for finding the solution, Andy, and thank you for the detailed breakdown. These steps are really helpful, another thing I can try when investigating issues

anjackson commented 1 year ago

I was hoping you'd find that useful!

Under ukwa/ukwa-services#100 I've worked through better tracing of these kind of things, and BETA now works too. Still need to roll out to PROD.

anjackson commented 1 year ago

Rolled out now.

anjackson commented 1 year ago

Gah, of course, need to do the same for QA Wayback.... e.g. https://www.webarchive.org.uk/act/wayback/archive/20220723104224/https://www.mind.org.uk/

anjackson commented 1 year ago

Changes in be607d4 mean https://dev.webarchive.org.uk/act/wayback/archive/20220723104224/https://www.mind.org.uk/ now works. Needs rolling out by @GilHoggarth to BETA and then PROD

GilHoggarth commented 1 year ago

Rolled out ukwa-services/ingest/w3act master onto beta swarm.

GilHoggarth commented 1 year ago

Tagged the code in ingest/w3act and released onto production.