Open ono760 opened 6 years ago
+1
It seems that only one thread is working from top
.
Backing-up all influx databases (around 950Gb) with the following command line
influxd backup -portable myBackupDir
On a VM having 4 CPUs + 16Gb RAM, only between 25% and 50% of CPU is being used and 12Gb out of 16Gb.
Started 5h ago : only 44Gb backed-up for now ...
I've 16 CPUs and 64GB RAM and it's really slow, b/c I've almost all memory free, about 95% cpu idle and about 100-200 iops on ssd, therefore it's obvious that current issue in backup tool code.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.
Please reopen. Same issue here.
TWELVE HOURS for 293M in /var/lib/influxdb/data/
using influxd backup -portable /var/backups/influxdb/raw/
...
Is this normal behaviour for InfluxDB backups & restores?
@hbokh yap. We no longer backup influx using the built-in tools, we use rsync and disk snapshots.
We currently do backups like this:
influxd backup -portable /var/backups/influxdb/raw/
They take a long time to finish (> 12h).
But from the docs, I found the -host
option, so I gave it a try:
influxd backup -portable -host 127.0.0.1:8088 /var/backups/influxdb/raw2/
ALMOST instant!
the problem is the restore, not the backup :)
the problem is the restore, not the backup :)
Well, the restore was a breeze too here, using a backup from 1.6.4 restored into 1.7.9:
influxd restore -portable -host 127.0.0.1:8088 /var/backups/influxdb/raw2/
Still testing stuff TBH.
for larger databases? ours takes days
du -chs /var/lib/influxdb
471G /var/lib/influxdb
du -chs /var/lib/influxdb
We're in a different league here 😁
# du -chs /var/lib/influxdb
301M /var/lib/influxdb
@retoo could you provide logs of the server during this time? I particularly interested if your caches cannot be snapshotted by the backup tool because they're busy handling write workloads.
I tested this on some data in a local instance:
edd@tr:~|⇒ du -chs ~/.influxdb
11G /home/edd/.influxdb
11G total
edd@tr:~|⇒ time ossinfluxd-1.8 backup -portable ~/backup-test
2019/11/15 13:27:27 backing up metastore to /home/edd/backup-test/meta.00
2019/11/15 13:27:27 No database, retention policy or shard ID given. Full meta store backed up.
2019/11/15 13:27:27 Backing up all databases in portable format
2019/11/15 13:27:27 backing up db=
2019/11/15 13:27:27 backing up db=_internal rp=monitor shard=1 to /home/edd/backup-test/_internal.monitor.00001.00 since 0001-01-01T00:00:00Z
2019/11/15 13:27:27 backing up db=_internal rp=monitor shard=3 to /home/edd/backup-test/_internal.monitor.00003.00 since 0001-01-01T00:00:00Z
2019/11/15 13:27:27 backing up db=_internal rp=monitor shard=5 to /home/edd/backup-test/_internal.monitor.00005.00 since 0001-01-01T00:00:00Z
2019/11/15 13:27:27 backing up db=_internal rp=monitor shard=7 to /home/edd/backup-test/_internal.monitor.00007.00 since 0001-01-01T00:00:00Z
2019/11/15 13:27:27 backing up db=_internal rp=monitor shard=9 to /home/edd/backup-test/_internal.monitor.00009.00 since 0001-01-01T00:00:00Z
2019/11/15 13:27:27 backing up db=_internal rp=monitor shard=11 to /home/edd/backup-test/_internal.monitor.00011.00 since 0001-01-01T00:00:00Z
2019/11/15 13:27:27 backing up db=_internal rp=monitor shard=13 to /home/edd/backup-test/_internal.monitor.00013.00 since 0001-01-01T00:00:00Z
2019/11/15 13:27:27 backing up db=db rp=autogen shard=2 to /home/edd/backup-test/db.autogen.00002.00 since 0001-01-01T00:00:00Z
2019/11/15 13:27:27 backing up db=db2 rp=autogen shard=14 to /home/edd/backup-test/db2.autogen.00014.00 since 0001-01-01T00:00:00Z
2019/11/15 13:28:04 backup complete:
2019/11/15 13:28:04 /home/edd/backup-test/20191115T132727Z.meta
2019/11/15 13:28:04 /home/edd/backup-test/20191115T132727Z.s1.tar.gz
2019/11/15 13:28:04 /home/edd/backup-test/20191115T132727Z.s3.tar.gz
2019/11/15 13:28:04 /home/edd/backup-test/20191115T132727Z.s5.tar.gz
2019/11/15 13:28:04 /home/edd/backup-test/20191115T132727Z.s7.tar.gz
2019/11/15 13:28:04 /home/edd/backup-test/20191115T132727Z.s9.tar.gz
2019/11/15 13:28:04 /home/edd/backup-test/20191115T132727Z.s11.tar.gz
2019/11/15 13:28:04 /home/edd/backup-test/20191115T132727Z.s13.tar.gz
2019/11/15 13:28:04 /home/edd/backup-test/20191115T132727Z.s2.tar.gz
2019/11/15 13:28:04 /home/edd/backup-test/20191115T132727Z.s14.tar.gz
2019/11/15 13:28:04 /home/edd/backup-test/20191115T132727Z.manifest
ossinfluxd-1.8 backup -portable ~/backup-test 216.09s user 18.42s system 615% cpu 38.127 total
Then, with a constant write workload on influxd
:
edd@tr:~|⇒ time ossinfluxd-1.8 backup -portable ~/backup-test
2019/11/15 13:30:37 backing up metastore to /home/edd/backup-test/meta.00
2019/11/15 13:30:37 No database, retention policy or shard ID given. Full meta store backed up.
2019/11/15 13:30:37 Backing up all databases in portable format
2019/11/15 13:30:37 backing up db=
2019/11/15 13:30:37 backing up db=_internal rp=monitor shard=1 to /home/edd/backup-test/_internal.monitor.00001.00 since 0001-01-01T00:00:00Z
2019/11/15 13:30:37 backing up db=_internal rp=monitor shard=3 to /home/edd/backup-test/_internal.monitor.00003.00 since 0001-01-01T00:00:00Z
2019/11/15 13:30:37 backing up db=_internal rp=monitor shard=5 to /home/edd/backup-test/_internal.monitor.00005.00 since 0001-01-01T00:00:00Z
2019/11/15 13:30:37 backing up db=_internal rp=monitor shard=7 to /home/edd/backup-test/_internal.monitor.00007.00 since 0001-01-01T00:00:00Z
2019/11/15 13:30:37 backing up db=_internal rp=monitor shard=9 to /home/edd/backup-test/_internal.monitor.00009.00 since 0001-01-01T00:00:00Z
2019/11/15 13:30:37 backing up db=_internal rp=monitor shard=11 to /home/edd/backup-test/_internal.monitor.00011.00 since 0001-01-01T00:00:00Z
2019/11/15 13:30:37 backing up db=_internal rp=monitor shard=13 to /home/edd/backup-test/_internal.monitor.00013.00 since 0001-01-01T00:00:00Z
2019/11/15 13:30:37 backing up db=db rp=autogen shard=2 to /home/edd/backup-test/db.autogen.00002.00 since 0001-01-01T00:00:00Z
2019/11/15 13:30:38 backing up db=db rp=autogen shard=15 to /home/edd/backup-test/db.autogen.00015.00 since 0001-01-01T00:00:00Z
2019/11/15 13:30:38 backing up db=db2 rp=autogen shard=14 to /home/edd/backup-test/db2.autogen.00014.00 since 0001-01-01T00:00:00Z
2019/11/15 13:32:18 backup complete:
2019/11/15 13:32:18 /home/edd/backup-test/20191115T133037Z.meta
2019/11/15 13:32:18 /home/edd/backup-test/20191115T133037Z.s1.tar.gz
2019/11/15 13:32:18 /home/edd/backup-test/20191115T133037Z.s3.tar.gz
2019/11/15 13:32:18 /home/edd/backup-test/20191115T133037Z.s5.tar.gz
2019/11/15 13:32:18 /home/edd/backup-test/20191115T133037Z.s7.tar.gz
2019/11/15 13:32:18 /home/edd/backup-test/20191115T133037Z.s9.tar.gz
2019/11/15 13:32:18 /home/edd/backup-test/20191115T133037Z.s11.tar.gz
2019/11/15 13:32:18 /home/edd/backup-test/20191115T133037Z.s13.tar.gz
2019/11/15 13:32:18 /home/edd/backup-test/20191115T133037Z.s2.tar.gz
2019/11/15 13:32:18 /home/edd/backup-test/20191115T133037Z.s15.tar.gz
2019/11/15 13:32:18 /home/edd/backup-test/20191115T133037Z.s14.tar.gz
2019/11/15 13:32:18 /home/edd/backup-test/20191115T133037Z.manifest
ossinfluxd-1.8 backup -portable ~/backup-test 243.15s user 27.64s system 264% cpu 1:42.27 total
edd@tr:~|⇒
So when the server is handling writes and the cache is bust the backup took 64 seconds longer (102 seconds versus 38 seconds).
@retoo @hbokh can you verify your influxDB versions please?
@retoo @hbokh can you verify your influxDB versions please?
Thank you for re-opening, @e-dard !
I am currently working on a migration to a new InfluxDB-server, from v1.6.4 to v1.7.9.
As said, adding the -host 127.0.0.1:8088
option solved my specific issue.
In a discussion at work it was mentioned that the long backup might have to do with the process switching between the IPv6 and IPv4 address; timing out; retrying; etc.
There is a huge difference of behaviour between windows and linux by the way on version 1.7.8. For the same database, on windows backup&restore take ~5min and dump files are ~12Mo on disk, while on Linux, backup&restore perform instantly and files are ~3Mo.
I think one of the issues is that when doing a local backup everything is still streamed out over a TCP connection. It seems more sensible to handle this usage differently. Rather than streaming hard-linked files over a TCP connection, we should just create archives directly off of the source files.
@e-dard please clarify. To me, "create archives directly off of the source files" implies locking up the live data for a much longer window of time compared to creating hard links. Perhaps I don't understand the problem fully.
Ah, local backups are different, yes. I'm still unclear re why we wouldn't create hard links, but I agree that avoiding the TCP layer locally would help.
Next steps: set up some realistic workload tests and figure out if we're saturating bandwidth or disk read speed (and especially if neither, why not).
The -host
flag is interesting. The default is localhost:8088
. It is surprising changing to 127.0.0.1:8088
would change things so much, we need to try to repro and understand that.
One thing to note is that issue is really two issues: OSS backup and enterprise backup (there is an engineering assistance request that has been closed in favor of tracking here) . While some fixes to OSS backup will benefit Enterprise, there are other steps which will be Enterprise only. We may need to split this ticket in two in the OSS and Enterprise repos after we have an initial design for the improvements and see if there is work in both.
Backup timings in seconds comparing the default localhost:8088 and the -host 127.0.0.1:8088
parameter. Backups are about 6 GB. There may be differences between localhost
and 127.0.0.1 in going through the network stack, and, of course, localhost
can be mapped to a different address (like 127.0.1.1). In the timings below, 127.0.0.1 uses less computational time (user+sys) by a small amount, but more real time. I suspect local configuration and the installed network stack would affect whether one or the other is preferable, but the default should remain localhost
.
@lesam @hbokh - TL;DR - don't see much difference between 127.0.0.1 and localhost
real 127.0.0.1 | real localhost | user 127.0.0.1 | user localhost | sys 127.0.0.1 | sys localhost |
---|---|---|---|---|---|
18.604 | 21.675 | 13.246 | 24.061 | 12.666 | 21.082 |
22.33 | 20.617 | 17.168 | 16.705 | 16.288 | 15.539 |
25.282 | 23.504 | 17.718 | 16.4 | 16.288 | 15.504 |
24.618 | 19.22 | 17.337 | 14.829 | 15.925 | 14.013 |
-- | -- | -- | -- | -- | -- |
22.7085 | 21.254 | 16.36725 | 17.99875 | 15.29175 | 16.5345 |
@davidby-influx I think we discussed this since this comment, but for posterity - let's focus on why backing up a few hundred GB is taking hours, e.g. the private issue linked above where a backup of 125GB took > 7 hours, that seems too long. And it maxed out disk IO. Seems like we have some inefficient pattern when backup is in progress. Would be interesting to know if backup is still the same time when queries/writes are turned off.
The backup progress in that issue is running at ~ 5MB/s for each shard on extremely fast disks, something seems wrong.
And it looks like your tests aren't catching whatever the issue is because they're running at ~6GB/25s = 240MB/s
@lesam Just for the record, the backup that's in the EAR (2885) was of a clone that should have had no incoming or outgoing traffic except for the backup.
@lesam - we have the private issue mentioned above, which is a clearly pathological case, and a bug of some sort, then we have the general inefficiency of the backup process in both OSS and Enterprise. We need to solve both, IMHO; find the bug, and do things like parallelize backup in general.
My tests were just to check the comment above that the -host
parameter made a huge difference, not an attempt to replicate the private issue linked above.
@davidby-influx the top comment on this issue:
I want to be able to quickly backup my prod data and restore it to other environments (dev/staging). The current backup and restore process is slow, sometimes taking hours. Currently for ~500GB of data, a full backup takes ~7.5 hrs to complete; restore takes more than 10 hours to complete.
That's about 20MB/s, which still seems quite slow for a single node of OSS - sounds closer to our pathological case than just wanting a bit of speed improvement.
Internal developer notes:
The use of io.CopyN
here wraps the io.Writer in a LimitReader
which prevents it from using alternate, perhaps faster interfaces, like ReaderFrom
. The io.CopyN
seems like a safety measure to snapshot the file that may still be growing, but it could also have a detrimental effect on performance.
Update: There seems to be no way to get ReaderFrom and WriterTo interfaces when using archive/tar and os.File, so CopyN
is not an issue.
Internal developer notes: On a 6-data node cluster, running on my local machine under docker, I'm seeing backup speeds like this:
Backed up to . in 13.321069249s, transferred 2141312247 bytes Backed up to /root/.influxdb/backies in 12.497472908s, transferred 2144231159 bytes
So between 160 and 172 MB / second. Not terribly fast, but not pathologically slow.
The pathological case above (500GB / 7.5 hours) is about 1/9th as fast.
@davidby-influx Would you like me to spin up a cluster for you to test in AWS? I feel that maybe the drag comes from the inter-nodal communication (meta node doing the backup <-> data node(s)) and if you're running it on docker it doesn't have to leave your laptop, right? :)
@danatinflux - yes, please. Let's spin up a copy/clone of something that we have seen backup-slowness on, if possible.
I think on a laptop you are going to see lots and lots of disk contention.
@gwossum - you had some interesting results yesterday....
I think on a laptop you are going to see lots and lots of disk contention. (@timhallinflux)
But, we have reports of OSS backups performing terribly, presumably on the same machine which were resolved by explicitly specifying the IP and other weirdness. So I am hoping that we can get a local repro of some problem involving local TCP/IP that perhaps affects both OSS and Enterprise.
Some assorted findings about backups and network throughput:
DownloadShardSnapshot
RPC is essentially a file transfer operation, just like scp. DownloadShardSnapshot
was also slower than using scp on the exact same file. It was normally about 5% to 20% slower than scp, normally closer to the 20% slower side. It's even worse when you consider I did not have TLS configured for InfluxDB, so there was no TLS data or CPU overhead.DownloadShardSnapshot
RPC was 5% to 20% slower than scp. I think this is a pretty good indicator that we are not using the network bandwidth efficiently, rather than being limited by disk I/O or CPU.DownloadShardSnapshot
is done using io.Copy
. If the Reader
and Writer
don't implement ReaderFrom
or WriterTo
interfaces, then io.Copy
falls back to transferring data using an internal 32 kB buffer. There are artifacts of this 32 kB buffer in the shape of the data going over the network.
io.CopyBuffer
function that allows passing in your own buffer. A larger buffer and/or a buffer which is multiple of the underlying network interface MTU may improve throughput WriterTo
and ReaderFrom
interfaces may be helpful as welltlv.EncodeTLV
causes 3 TCP segments + ACKs to be sent over the wire: One for the length byte, one for the type byte, and one for the value. This can be improved and might help RPC heavy operations, but it probably won't help the backup.In the comment above, the Go tar
package seems to prevent ReaderFrom
use. It may be difficult with the current Go packages to use ReaderFrom
or WriterTo
interfaces.
I tested implementing ReadFrom()
on the tar.Writer
type (because a tar.Writer.readFrom()
exists in the source file in Go 1.17). Because of the layering of the tar.Writer
the call does no end up using sendfile
or copy_file_range
even when tar.Writer.readFrom
is exported. I believe we will have more luck tuning socket parameters and buffer sizes than trying to force the use of optimized system calls.
Dropping a bufio.Writer
of 1,024,000 bytes in between the tar.Writer
and its underlying io.writer
has no statistically discernible effect on performance.
In stream.go
func Stream(w io.Writer, dir, relativePath string, writeFunc func(f os.FileInfo, shardRelativePath, fullPath string, tw *tar.Writer) error) (rErr error) {
bw := bufio.NewWriterSize(w, 1000 * 1024)
tw := tar.NewWriter(bw)
defer func() {
errors2.Capture(&rErr, tw.Close)()
errors2.Capture(&rErr, bw.Flush)()
}()
I am unable to reproduce pathologically slow (10x slowdown) backups under "normal" conditions. However, a brand new cloud1 clone seems to consistently show the issue. After an initial very slow backup, subsequent backups will proceed at expected speed. Further investigation into how potato creates clones is required.
Nice work!
@gwossum Cool -- so do I understand this correctly? It's possibly an issue with Potato and not something customers are experiencing (assuming they're not committing whatever sin Potato might be committing)?
No. There are plenty of issues from community members about slow backups.
Yes I confirm, I always used only the community version and faced those issues.
I know you probably checked this already, but I have to ask---what's the size delta between the initial backup and the subsequent backup? I want to make sure that the 'second backup' isn't an incremental that's disguising itself as a full.
@danatinflux The subsequent backups are also full backups. Using -strategy full
and a different output directory.
@samhld @timhallinflux There are probably multiple issues that can cause slow backups, but the one involving slow backups after cloning a cloud1 cluster are the ones that I currently have a repro for.
We are seeing this in OSS, on other machines than our cloud 1, so I am hoping that the problem is not specific to our hosting, but our repro helps us pinpoint the issue.
Discussion of slow backups on cloud1 clones has been moved to a new ticket in a different repo: https://github.com/influxdata/plutonium/issues/3816
I want to be able to quickly backup my prod data and restore it to other environments (dev/staging). The current backup and restore process is slow, sometimes taking hours. Currently for ~500GB of data, a full backup takes ~7.5 hrs to complete; restore takes more than 10 hours to complete.