mbentley / docker-timemachine

Docker image to run Samba (compatible Time Machine for macOS)
Apache License 2.0
572 stars 65 forks source link

Can't backup on two different machines when VOLUME_SIZE_LIMIT is enabled #81

Closed israsanc closed 9 months ago

israsanc commented 3 years ago

When I try to use the same timemachine volume on a second MacBook Pro it fails and log shows

timemachine | fruit_tmsize_do_dirent: tmsize potential overflow: bandsize [67108864] nbands [1459] timemachine | sys_disk_free: VFS disk_free failed. Error was : No error information

This doesn't occur for the first backup/machine and doesn't occur if I don't set VOLUME_SIZE_LIMIT

mbentley commented 3 years ago

What is the full Docker command to start the time machine? How much disk space is being used on the Docker host where your persistent data is?

I know it works with two macs as I back up two myself but let's narrow down the potential issues.

israsanc commented 3 years ago

This is my service definition in the docker-compose yaml:

  timemachine:
    container_name: timemachine
    image: mbentley/timemachine:smb-armv7l
    hostname: timemachine
    domainname: {my_domain}
    mac_address: {random_mac_address}
    networks:
      macvlan:
        ipv4_address: {local_ip}
    environment:
      - CUSTOM_SMB_CONF=false
      - CUSTOM_USER=false
      - DEBUG_LEVEL=1
      - MIMIC_MODEL=TimeCapsule8,119
      - EXTERNAL_CONF=
      - HIDE_SHARES=no
      - TM_USERNAME=timemachine
      - TM_GROUPNAME=timemachine
      - TM_UID=1000
      - TM_GID=1000
      - PASSWORD={my_password}
      - SET_PERMISSIONS=false
      - SHARE_NAME=TimeMachine
      - SMB_INHERIT_PERMISSIONS=no
      - SMB_NFS_ACES=yes
      - SMB_METADATA=stream
      - SMB_PORT=445
      - SMB_VFS_OBJECTS=acl_xattr fruit streams_xattr
      - VOLUME_SIZE_LIMIT=1 T
      - WORKGROUP=WORKGROUP
    volumes:
      - ./timemachine-opt-timemachine:/opt/timemachine
      - ./timemachine-var-lib-samba:/var/lib/samba
      - ./timemachine-var-cache-samba:/var/cache/samba
      - ./timemachine-run-samba:/run/samba
    ports:
      - 137:137/udp
      - 138:138/udp
      - 139:139
      - 445:445
    restart: unless-stopped

I'm using macvlan driver to avoid conflicts with avahi, and my filesystem is btrfs. My current backup uses only 92G (du says).

mbentley commented 3 years ago

I think what you're hitting is related to what is being seen or at least was tempted to be worked around here: https://gitlab.com/artmg/samba/-/commit/b1714dbf74035550ff30494858e3d879c8d46003

Taking a look a the comment message in the diff:

    /*
     * Arithmetic on 32-bit systems may cause overflow, depending on
     * size_t precision. First we check its unlikely, then we
     * force the precision into target off_t, then we check that
     * the total did not overflow either.
     */

Which would be 97911832576 and that converted to GiB (which is what it is measuring against, not GB) is 91.1875 GiB which matches what you're seeing on disk via du. I am not much of a programmer and I don't have experience in C so I am not exactly sure what it is doing but it just seems to be failing on https://gitlab.com/samba-team/samba/-/blob/b0ba7cd4f96a6ea227943cb05ef51a463e292b2d/source3/modules/vfs_fruit.c#L4995-4999

Based on the output you provided: bandsize [67108864] nbands [1459]

And then looking at the if statement's math: bandsize > SIZE_MAX/nbands

The actual math (I believe) should be:

67108864 > 1099511627776 / 1459
67108864 > 753606324

Which should return false so it should never drop into that loop and output the message you're seeing if it wasn't overflowing as warned.

That seems odd to me. Could you get the contents of the smb.conf that is generated inside your container? For example, mine:

# docker exec -it timemachine cat /etc/samba/smb.conf
[global]
   access based share enum = no
   hide unreadable = no
   inherit permissions = no
   load printers = no
   log file = /var/log/samba/log.%m
   logging = file
   max log size = 1000
   security = user
   server min protocol = SMB2
   server role = standalone server
   smb ports = 445
   workgroup = WORKGROUP
   vfs objects = acl_xattr fruit streams_xattr
   fruit:aapl = yes
   fruit:nfs_aces = yes
   fruit:model = TimeCapsule8,119
   fruit:metadata = stream
   fruit:veto_appledouble = no
   fruit:posix_rename = yes
   fruit:wipe_intentionally_left_blank_rfork = yes
   fruit:delete_empty_adfiles = yes

[TimeMachine]
   path = /opt/timemachine
   inherit permissions = no
   read only = no
   valid users = timemachine
   vfs objects = acl_xattr fruit streams_xattr
   fruit:time machine = yes
   fruit:time machine max size = 2 T

I want to make sure that it is setting fruit:time machine max size as expected.

israsanc commented 3 years ago

Thank you for your help. Using du without the human-readable switch says 95520632.

It seems you've found a good clue to follow. I'll investigate this myself as well.

My current smb.conf:

[global]
   access based share enum = no
   hide unreadable = no
   inherit permissions = no
   load printers = no
   log file = /var/log/samba/log.%m
   logging = file
   max log size = 1000
   security = user
   server min protocol = SMB2
   server role = standalone server
   smb ports = 445
   workgroup = WORKGROUP
   vfs objects = acl_xattr fruit streams_xattr
   fruit:aapl = yes
   fruit:nfs_aces = yes
   fruit:model = TimeCapsule8,119
   fruit:metadata = stream
   fruit:veto_appledouble = no
   fruit:posix_rename = yes
   fruit:wipe_intentionally_left_blank_rfork = yes
   fruit:delete_empty_adfiles = yes

[TimeMachine]
   path = /opt/timemachine
   inherit permissions = no
   read only = no
   valid users = timemachine
   vfs objects = acl_xattr fruit streams_xattr
   fruit:time machine = yes
   fruit:time machine max size = 1 T
mbentley commented 3 years ago

Hmm yeah, it seems to be setting it correctly. I previously recall some strange compose behaviors with values that include spaces but on first glance, I see nothing that could be impacted here. I almost never use compose just due to how often I find myself fighting syntax issues instead of the actual problem I am solving so my memory there is a big fuzzy.

xrvo commented 3 years ago

I have the same issue: I get the "tmsize potential overflow" error in the logs.

I'm using the armv7l docker image with VOLUME_SIZE_LIMIT = 500G

I did also independently trace the issue down to the same issue in the samba repository that @mbentley pointed out. Samba had a fix applied on Mar. 3, 2020, and it is apparent the change is in there on the installed version since the diff shows the error message string changing from tmsize overflow to tmsize potential overflow. The issue persists however.

mbentley commented 3 years ago

From what I can tell from the code in this commit, it would exit the function due to the return false; so it never hits the modifications made in tm_size = (off_t)bandsize * (off_t)nbands;. I am not sure if that is the intent - the change in the output makes it sound like it should be reporting a potential overflow but maybe doing some further check but I might just be misunderstanding because when looking at the original implementation here, it mentions it can't check for multiplication overflow on performing multiplication. I don't know enough about what exactly it is doing and why to understand and bring it up to someone who does know exactly.

hollie commented 3 years ago

I can confirm I am hitting the same issue running the armv7l docker image with VOLUME_SIZE_LIMIT set to 1 T.

The error in the log is:

fruit_tmsize_do_dirent: tmsize potential overflow: bandsize [67108864] nbands [6372]
sys_disk_free: VFS disk_free failed. Error was : No error information

This error also prevents other clients to make a connection via Samba, you can mount the share but when you start browsing it via Finder is results in an 'network share is temporarily unavailable error'. (Might not be the exact error in English, it is translated from my local language).

My current workaround is to remove the VOLUME_SIZE_LIMIT parameter from the configuration when starting the docker container. Then all is working as expected.

mbentley commented 3 years ago

Looking at another image available, there might be another way to apply a limit: https://github.com/awlx/samba-timemachine/blob/main/entrypoint#L37

I'll have to look into the use of a .com.apple.TimeMachine.quota.plist file as an alternative.

bugsyb commented 2 years ago

Hi @mbentley ,

Smells am running into very same issue:

fruit_tmsize_do_dirent: tmsize potential overflow: bandsize [8388608] nbands [2805]
sys_disk_free: VFS disk_free failed. Error was : Argument list too long

Limit is set to 1T too and it is during initial copy (migration) of existing time machine disk. Sparse initialize by adding new disk, once it started, cancelled, mounted the sparse "disk image" and started to copy over the source from HDD (time machine).

Plenty of these messages pop up continuously.

Environment - it's aarch64 with alpine:latest as of today (PRETTY_NAME="Alpine Linux v3.14"). HW side is:

model name  : Amlogic S922X rev a
Hardware    : Hardkernel ODROID-N2
Revision    : 0400

Most probably it pulled armv7 and not armv8 as for other images unless I've forced it by arm64v8/alpine:latest then it was using armv7. Not sure anymore how to check on existing container.

Would you be able to assist how to overcome the problem or what might be consequences of leaving it like this? I wouldn't like to play with backup if something is odd on underlying fs.

Removing the quote doesn't seem to be a good idea here as it is same FS (ext4) which is used for other services hence quota needs to be enforced at software level. If not limited time machine will happily eat all space, won't it?

mbentley commented 2 years ago

Could you please provide your docker run command or compose with credentials removed?

bugsyb commented 2 years ago

Thank you for great docker image!

Creds are in user file - but thanks for pointing it out. I use Dockerfile build as tried with arm64v8/alpine:latest, but need to wait for relative to kick-start copy of files again, hence can't confirm if on armv8 it would go through.

version: "3.7"
#https://github.com/mbentley/docker-timemachine
services:
  timemachine:
    image: local/timemachine:smb
    build: ./docker-timemachine/
    container_name: timecapsule
    hostname: TimeCapsule
    environment:
      - CUSTOM_SMB_CONF=false
      - CUSTOM_USER=false
      - DEBUG_LEVEL=1
      - EXTERNAL_CONF=/users
      - HIDE_SHARES=no
      - MIMIC_MODEL=TimeCapsule8,119
      - TM_USERNAME=timemachine
      - TM_GROUPNAME=timemachine
      - TM_UID=1000
      - TM_GID=1000
      - PASSWORD=timemachine
      - SET_PERMISSIONS=false
      - SHARE_NAME=TimeMachine
      - SMB_INHERIT_PERMISSIONS=no
      - SMB_NFS_ACES=yes
      - SMB_METADATA=stream
      - SMB_PORT=445
      - SMB_VFS_OBJECTS=acl_xattr fruit streams_xattr
      - VOLUME_SIZE_LIMIT=0
      - WORKGROUP=WORKGROUP
    restart: unless-stopped
    volumes:
      - ${APPO}/timecapsule/users:/users:ro
      - ${HDD}/home-timecapsule:/opt/
      - ${APPO}/timecapsule/var-lib-samba:/var/lib/samba
      - ${APPO}/timecapsule/var-cache-samba:/var/cache/samba
      - ${APPO}/timecapsule/run-samba:/run/samba
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    networks:
      mvl_lan:
        ipv4_address: 1.2.3.4

networks:
  mvl_lan:
    external: true
bugsyb commented 2 years ago

@mbentley - after moving to arm64v8/alpine:latest as base image - it kind of worked. Copy of the current backup stopped at around 400GB through (out of ~960GB) complaining about one of the files. New "issue" described here: https://github.com/mbentley/docker-timemachine/issues/105. Might be not an issue - just simple too slow HDD head movement.