open-learning-exchange / planet

🌍 Planet Learning - Angular application
https://hub.docker.com/r/treehouses/planet
GNU Affero General Public License v3.0
62 stars 39 forks source link

Replication for large file failure #1563

Open paulbert opened 6 years ago

paulbert commented 6 years ago

@dogi @lmmrssa Found the error message in the CouchDB log. This is from an attempt to replicate a 602 MB file from dev to a Raspberry Pi

[info] 2018-07-03T22:29:09.916116Z nonode@nohost <0.47.0> -------- alarm_handler: {clear,system_memory_high_watermark}
[notice] 2018-07-03T22:29:21.240591Z nonode@nohost <0.30196.12> 0e69c39755 192.168.0.101:2200 192.168.0.56 test GET /_active_tasks 200 ok 3
[error] 2018-07-03T22:29:23.913049Z nonode@nohost <0.6292.13> -------- Replicator, request GET to "https://dev.media.mit.edu:2200/resources/5620f429d9e6f8c5df84062efb01ef39?atts_since=%5B%221-67864fe65beb9f0443773055962d24f9%22%5D&revs=true&open_revs=%5B%222-8424b6e5d487f38eda80075fe0e7e97c%22%5D&latest=true" failed due to error {connection_closed,mid_stream}
[notice] 2018-07-03T22:29:23.913958Z nonode@nohost <0.10111.12> -------- Retrying GET to https://dev.media.mit.edu:2200/resources/5620f429d9e6f8c5df84062efb01ef39?atts_since=%5B%221-67864fe65beb9f0443773055962d24f9%22%5D&revs=true&open_revs=%5B%222-8424b6e5d487f38eda80075fe0e7e97c%22%5D&latest=true in 256.0 seconds due to error {error,{connection_closed,mid_stream}}
paulbert commented 6 years ago

Was able to replicate a 483 mb file from dev to raspberry pi running treehouses 57 and planet 0.2.7

paulbert commented 6 years ago

Another failure with 236 MB file from dev to raspberry pi running treehouses stretchjune and planet 0.2.8:

[error] 2018-07-06T17:55:44.276983Z nonode@nohost <0.5739.54> -------- Replication crashing because GET https://dev.media.mit.edu:2200/resources/dfbc328c45c93f22ffd2088971108188?revs=true&open_revs=%5B%224-1d83b37ca3053d9574a83a6ab8ca318f%22%5D&latest=true failed
[error] 2018-07-06T17:55:44.277936Z nonode@nohost <0.4673.54> -------- Worker <0.4807.54> died with reason: {process_died,<0.5739.54>,kaboom}
[error] 2018-07-06T17:55:44.278852Z nonode@nohost <0.4673.54> -------- Replication `c47db1a0af04585f73551cd8af445e91` (`https://dev.media.mit.edu:2200/resources/` -> `http://192.168.0.101:2200/resources/`) failed: {worker_died,<0.4807.54>,{process_died,<0.5739.54>,kaboom}}
[error] 2018-07-06T17:55:44.279777Z nonode@nohost <0.4807.54> -------- gen_server <0.4807.54> terminated with reason: {process_died,<0.5739.54>,kaboom}
  last msg: {'EXIT',<0.5739.54>,kaboom}
     state: {state,<0.4673.54>,<0.4818.54>,20,{httpdb,"https://dev.media.mit.edu:2200/resources/",nil,[{"Accept","application/json"},{"Authorization","Basic dGVzdEBwYXVsMDUwNzIwMTg6dGVzdA=="},{"User-Agent","CouchDB-Replicator/2.1.1"}],300000,[{is_ssl,true},{socket_options,[{keepalive,true},{nodelay,false}]},{ssl_options,[{depth,3},{verify,verify_none}]}],5,250,<0.4697.54>,20,nil,undefined},{httpdb,"http://192.168.0.101:2200/resources/",nil,[{"Accept","application/json"},{"Authorization","Basic dGVzdDp0ZXN0"},{"User-Agent","CouchDB-Replicator/2.1.1"}],300000,[{socket_options,[{keepalive,true},{nodelay,false}]}],5,250,<0.5511.54>,20,nil,undefined},[<0.5739.54>],nil,nil,{<0.4818.54>,#Ref<0.0.24.88108>},[{missing_checked,1},{missing_found,1}],nil,nil,{batch,[],0}}
[error] 2018-07-06T17:55:44.281542Z nonode@nohost <0.4807.54> -------- CRASH REPORT Process  (<0.4807.54>) with 1 neighbors exited with reason: {process_died,<0.5739.54>,kaboom} at gen_server:terminate/6(line:737) <= proc_lib:init_p_do_apply/3(line:237); initial_call: {couch_replicator_worker,init,['Argument__1']}, ancestors: [<0.4673.54>,couch_replicator_scheduler_sup,couch_replicator_sup,...], messages: [], links: [<0.4818.54>,<0.4673.54>], dictionary: [{last_stats_report,{1530,898530,840350}}], trap_exit: true, status: running, heap_size: 376, stack_size: 27, reductions: 178
[notice] 2018-07-06T17:55:44.283415Z nonode@nohost <0.515.0> -------- couch_replicator_scheduler: Job {"c47db1a0af04585f73551cd8af445e91",[]} started as <0.28421.54>
[error] 2018-07-06T17:55:44.285461Z nonode@nohost <0.4673.54> -------- gen_server {couch_replicator_scheduler_job,{[99,52,55,100,98,49,97,48,97,102,48,52,53,56,53,102,55,51,53,53,49,99,100,56,97,102,52,52,53,101,57,49],[]}} terminated with reason: {worker_died,<0.4807.54>,{process_died,<0.5739.54>,kaboom}}
  last msg: {'EXIT',<0.4807.54>,{process_died,<0.5739.54>,kaboom}}
     state: [{rep_id,{"c47db1a0af04585f73551cd8af445e91",[]}},{source,"https://dev.media.mit.edu:2200/resources/"},{target,"http://192.168.0.101:2200/resources/"},{db_name,<<"shards/60000000-7fffffff/_replicator.1530801482">>},{doc_id,<<"resources_pull_1530898528680">>},{options,[{checkpoint_interval,10000},{connection_timeout,300000},{create_target,false},{http_connections,20},{retries,5},{selector,{[{<<"$or">>,[{[{<<"_id">>,<<"dfbc328c45c93f22ffd2088971108188">>},{<<"_rev">>,<<"4-1d83b37ca3053d9574a83a6ab8ca318f">>}]}]}]}},{socket_options,[{keepalive,true},{nodelay,false}]},{use_checkpoints,true},{worker_batch_size,500},{worker_processes,4}]},{session_id,<<"1b0ee92a8ef661c4fa84382299c237e7">>},{start_seq,{0,0}},{source_seq,<<"240-g1AAAAEzeJzLYWBg4MhgTmHgzcvPy09JdcjLz8gvLskBCjMlMiTJ____PytREIeCJAUgmWQPVqOKS40DSE08WI0iLjUJIDX1YDX8ONTksQBJhgYgBVQ2H7ebIOoWQNTtz0o0xavuAETd_axEMbzqHkDUAd2nlwUA_GJj5g">>},{committed_seq,{0,0}},{current_through_seq,{0,0}},{highest_seq_done,{0,0}}]
[error] 2018-07-06T17:55:44.287446Z nonode@nohost <0.4673.54> -------- CRASH REPORT Process  (<0.4673.54>) with 1 neighbors exited with reason: {worker_died,<0.4807.54>,{process_died,<0.5739.54>,kaboom}} at gen_server:terminate/6(line:737) <= proc_lib:init_p_do_apply/3(line:237); initial_call: {couch_replicator_scheduler_job,init,['Argument__1']}, ancestors: [couch_replicator_scheduler_sup,couch_replicator_sup,...], messages: [], links: [<0.4798.54>,<0.499.0>], dictionary: [{task_status_props,[{changes_pending,null},{checkpoint_interval,...},...]},...], trap_exit: true, status: running, heap_size: 2586, stack_size: 27, reductions: 33060
[error] 2018-07-06T17:55:44.288827Z nonode@nohost <0.499.0> -------- Supervisor couch_replicator_scheduler_sup had child undefined started with {couch_replicator_scheduler_job,start_link,undefined} at <0.4673.54> exit with reason {worker_died,<0.4807.54>,{process_died,<0.5739.54>,kaboom}} in context child_terminated
paulbert commented 6 years ago

Error in the middle of logs during replication:

[error] 2018-07-10T16:04:53.849724Z nonode@nohost <0.14059.25> -------- Replicator, request GET to "https://dev.media.mit.edu:2200/resources/" failed due to error {error,{conn_failed,{error,ehostunreach}}}
paulbert commented 6 years ago

Another error, from the parent/source CouchDB:

[error] 2018-07-17T17:24:38.901699Z nonode@nohost <0.8396.0> -------- Replicator, request PUT to "https://dev.media.mit.edu:2200/resources/47b2bf59c96d58704147a870f110f879?new_edits=false" failed due to error {error,
    {'EXIT',
        {{{nocatch,{mp_parser_died,noproc}},
          [{couch_att,'-foldl/4-fun-0-',3,
               [{file,"src/couch_att.erl"},{line,613}]},
           {couch_att,fold_streamed_data,4,
               [{file,"src/couch_att.erl"},{line,664}]},
           {couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,617}]},
           {couch_httpd_multipart,atts_to_mp,4,
               [{file,"src/couch_httpd_multipart.erl"},{line,208}]}]},
         {gen_server,call,
             [<0.14678.0>,
              {send_req,
                  {{url,
                       "https://dev.media.mit.edu:2200/resources/47b2bf59c96d58704147a870f110f879?new_edits=false",
                       "dev.media.mit.edu",2200,undefined,undefined,
                       "/resources/47b2bf59c96d58704147a870f110f879?new_edits=false",
                       https,hostname},
                   [{"Accept","application/json"},
                    {"Authorization","Basic ZGV2OnZlZA=="},
                    {"Content-Length",247482906},
                    {"Content-Type",
                     "multipart/related; boundary=\"b556865da86f4acaf91e457b681c5048\""},
                    {"User-Agent","CouchDB-Replicator/2.1.1"}],
                   put,
                   {#Fun<couch_replicator_api_wrap.11.3480007>,
                    {<<"{\"_id\":\"47b2bf59c96d58704147a870f110f879\",\"_rev\":\"4-fcfed687ef902caf9acd4bbb816d375a\",\"title\":\"A Collection of Episodes: Star Trek (The Next Generation)\",\"author\":\"\",\"year\":\"\",\"description\":\"TV Show episode\",\"language\":\"\",\"publisher\":\"\",\"linkToLicense\":\"\",\"subject\":[\"Agriculture\"],\"level\":[\"Early Education\"],\"openWith\":\"\",\"resourceFor\":null,\"medium\":\"\",\"articleDate\":1529700743514,\"resourceType\":\"\",\"addedBy\":\"earth\",\"openUrl\":null,\"openWhichFile\":\"\",\"isDownloadable\":\"\",\"filename\":\"Star Trek TNG - 5x02 - Darmok.avi.mp4\",\"mediaType\":\"video\",\"sourcePlanet\":\"earth\",\"resideOn\":\"earth\",\"createdDate\":1530911380349,\"updatedDate\":1530912269118,\"_revisions\":{\"start\":4,\"ids\":[\"fcfed687ef902caf9acd4bbb816d375a\",\"00c6f8f6c3e473eaa12a5d31ce0fd288\",\"3e6c17a22e3d233a5cd50da3f4a8e299\",\"3f91526c3fd3e3ac99c68c91faddc0a6\"]},\"_attachments\":{\"Star Trek TNG - 5x02 - Darmok.avi.mp4\":{\"content_type\":\"video/mp4\",\"revpos\":2,\"digest\":\"md5-DQjX7ueKUeEMEhETdg5RyA==\",\"length\":247481637,\"follows\":true}}}">>,
                     [{att,<<"Star Trek TNG - 5x02 - Darmok.avi.mp4">>,
                          <<"video/mp4">>,247481637,247481637,
                          <<13,8,215,238,231,138,81,225,12,18,17,19,118,14,81,
                            200>>,
                          2,
                          {follows,<0.8395.0>,#Ref<0.0.0.258077>},
                          identity}],
                     <<"b556865da86f4acaf91e457b681c5048">>,247482906}},
                   [{response_format,binary},
                    {inactivity_timeout,30000},
                    {is_ssl,true},
                    {socket_options,[{keepalive,true},{nodelay,false}]},
                    {ssl_options,[{depth,3},{verify,verify_none}]}],
                   infinity}},
              infinity]}}}}
paulbert commented 6 years ago

Error on my Raspberry Pi:

[error] 2018-07-17T17:35:01.286158Z nonode@nohost <0.16582.1> -------- Replicator, request GET to "https://dev.media.mit.edu:2200/resources/dfbc328c45c93f22ffd2088971108188?revs=true&open_revs=%5B%226-a47984c7d1f0e6cfa5e0127c74dfe0b3%22%5D&latest=true" failed due to error req_timedout