mchangrh / sb-mirror

Docker containers to mirror the SponsorBlock database + API
Other
191 stars 29 forks source link

rsync --append without-verify won't make the output file identical #14

Closed Khang-NT closed 2 years ago

Khang-NT commented 2 years ago
➜  echo -n "Hello world" > source.txt
➜  echo -n "123" > dest.txt
➜  rsync -ztvP --zc=lz4 --append source.txt dest.txt
source.txt
             11 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=0/1)

sent 96 bytes  received 35 bytes  262.00 bytes/sec
total size is 11  speedup is 0.08
➜  cat dest.txt
123lo world

# expect dest.txt contains "Hello world"

rsync --append works with assumption that the existing content of source file isn't edited, and new content always append to the end of source file. Rows in sponsorTimes.csv can be modified or deleted time to time, so rsync --append will end up with a corrupted file.


I tried with --append-verify but it slow, only save 50% bandwidth, it isn't worth the CPU wasted to compress and compare the diff. ☹️

mchangrh commented 2 years ago

yeah unfortunately rsync at scale with constantly changing files is difficult, it might just have to be a very heavy hash comparison + recheck twice