shieldproject / shield

A standalone system that can perform backup and restore functions for a wide variety of pluggable data systems
MIT License
363 stars 69 forks source link

[BUG] mongodump and s3 together fail randomly while backing up larger DBs (>~15 GB) #701

Open JakubKC opened 3 years ago

JakubKC commented 3 years ago

Describe the bug While using mongodump and s3 together to backup larger DBs (>~15 GB) it fails randomly with either "EOF" or "net/http: HTTP/1.x transport connection broken: write tcp" messages.

Example of the output:

2021-01-04T18:16:11.739+0000    [........................]  datagen_it_test.test   6502010/431321000  (1.5%)
2021-01-04T18:16:11.739+0000
DEBUG> 'store' action returned error: Put https://shield-backup-test-docker.s3.amazonaws.com:443/...?partNumber=121&uploadId=...: EOF
Put https://shield-backup-test-docker.s3.amazonaws.com:443/...?partNumber=121&uploadId=...: EOF
{"archive_size":0,"compression":"bzip2","key":""}
2021-01-04T18:16:14.478+0000    terminating writes
2021-01-04T18:16:14.478+0000    [#.......................]  datagen_it_test.link  20928162/421444000  (5.0%)
2021-01-04T18:16:14.478+0000    MuxIn close datagen_it_test.link
2021-01-04T18:16:14.478+0000    [........................]  datagen_it_test.test  6502010/431321000  (1.5%)
2021-01-04T18:16:14.478+0000    MuxIn close datagen_it_test.test
2021-01-04T18:16:14.479+0000    Mux close namespace datagen_it_test.test
2021-01-04T18:16:14.479+0000    Mux close namespace datagen_it_test.link
2021-01-04T18:16:14.479+0000    Mux finish
2021-01-04T18:16:14.479+0000    archive writer: error writing data for collection `datagen_it_test.link` to disk: receive
d termination signal / write /dev/stdout: broken pipe
2021-01-04T18:16:14.479+0000    Failed: archive writer: error writing data for collection `datagen_it_test.link` to disk:
 received termination signal / write /dev/stdout: broken pipe
DEBUG> 'backup' action returned error: Unable to exec '/usr/bin/mongodump': exit status 1
Unable to exec '/usr/bin/mongodump': exit status 1
2021-01-04T07:02:46.292+0000    [###########.............]  datagen_it_test.test  200213344/431321000  (46.4%)
DEBUG> 'store' action returned error: Put https://...s3.amazonaws.com:443/...?partNumber=3527&uploadId=...: net/http: HTTP/1.x transport connection broken: write tcp 172.18.0.4:33702->52.219.75.197:443: write: broken pipe
Put https://...s3.amazonaws.com:443/...?partNumber=3527&uploadId=...: net/http: HTTP/1.x transport connection broken: write tcp 172.18.0.4:33702->52.219.75.197:443: write: broken pipe
{"archive_size":0,"compression":"bzip2","key":""}
2021-01-04T07:02:46.731+0000    [###########.............]  datagen_it_test.test  200213344/431321000  (46.4%)
2021-01-04T07:02:46.731+0000    MuxIn close datagen_it_test.test
2021-01-04T07:02:46.733+0000    Mux close namespace datagen_it_test.test
2021-01-04T07:02:46.733+0000    Mux finish
2021-01-04T07:02:46.733+0000    archive writer: error writing data for collection `datagen_it_test.test` to disk: error wr
iting to file: short write / write /dev/stdout: broken pipe
2021-01-04T07:02:46.736+0000    Failed: archive writer: error writing data for collection `datagen_it_test.test` to disk:
error writing to file: short write / write /dev/stdout: broken pipe
DEBUG> 'backup' action returned error: Unable to exec '/usr/bin/mongodump': exit status 1
Unable to exec '/usr/bin/mongodump': exit status 1

To Reproduce Steps to reproduce the behavior:

  1. Configure backup from >~50 GB MongoDB (I noticed that the bigger the DB, the bigger chance to catch the error) to S3 bucket.
  2. Run it and verify if it breaks with mentioned errors.

Expected behavior Always successfully finished backup process like for other, smaller DBs.

SHIELD versions (please complete the following information):