trapexit / mergerfs

a featureful union filesystem
http://spawn.link
Other
4.04k stars 168 forks source link

docker container (gitlab-ce) fails to start when using mergerfs as mounted data #1267

Closed tarchive closed 8 months ago

tarchive commented 8 months ago

Describe the bug

When using a mergerfs pool as the data volume of a fresh docker container using the official gitlab-ce docker image, it is unable to start due to the startup reconfigure failing. After changing the docker volume mounts to use the disk directly the container is able to run it's reconfiguration script without error.

To Reproduce

docker compose of container. All the data directories are empty to begin with.

version: '3'
services:
  gitlab-ce:
    container_name: GitLab-CE
    image: 'gitlab/gitlab-ce:15.4.6-ce.0'
    ports:
      - '9081:80'
    volumes:
        - '/mnt/data/gitlab-ce/config:/etc/gitlab'
        - '/mnt/data/gitlab-ce/data:/var/opt/gitlab'
        - '/mnt/data/gitlab-ce/log:/var/log/gitlab'`

Container starts then terminates.

Gitlab container error message. (Click to show) ``` Recipe: gitlab::gitlab-shell * storage_directory[/var/opt/gitlab/.ssh] action create * ruby_block[directory resource: /var/opt/gitlab/.ssh] action run ================================================================================ Error executing action `run` on resource 'ruby_block[directory resource: /var/opt/gitlab/.ssh]' ================================================================================ Mixlib::ShellOut::ShellCommandFailed ------------------------------------ Expected process to exit with [0], but received '1' ---- Begin output of chgrp git /var/opt/gitlab/.ssh ---- STDOUT: STDERR: chgrp: changing group of '/var/opt/gitlab/.ssh': No such file or directory ---- End output of chgrp git /var/opt/gitlab/.ssh ---- Ran chgrp git /var/opt/gitlab/.ssh returned 1 Cookbook Trace: (most recent call first) ---------------------------------------- /opt/gitlab/embedded/cookbooks/cache/cookbooks/package/libraries/storage_directory_helper.rb:35:in `run_command' /opt/gitlab/embedded/cookbooks/cache/cookbooks/package/libraries/storage_directory_helper.rb:52:in `ensure_permissions_set' /opt/gitlab/embedded/cookbooks/cache/cookbooks/package/resources/storage_directory.rb:42:in `block (3 levels) in class_from_file' /opt/gitlab/embedded/cookbooks/cache/cookbooks/package/resources/storage_directory.rb:36:in `block in class_from_file' Resource Declaration: --------------------- # In /opt/gitlab/embedded/cookbooks/cache/cookbooks/package/resources/storage_directory.rb 36: ruby_block "directory resource: #{new_resource.path}" do 37: block do 38: # Ensure the directory exists 39: storage_helper.ensure_directory_exists(new_resource.path) 40: 41: # Ensure the permissions are set 42: storage_helper.ensure_permissions_set(new_resource.path) 43: 44: # Error out if we have not achieved the target permissions 45: storage_helper.validate!(new_resource.path) 46: end 47: not_if { storage_helper.validate(new_resource.path) } 48: end 49: end Compiled Resource: ------------------ # Declared in /opt/gitlab/embedded/cookbooks/cache/cookbooks/package/resources/storage_directory.rb:36:in `block in class_from_file' ruby_block("directory resource: /var/opt/gitlab/.ssh") do action [:run] default_guard_interpreter :default declared_type :ruby_block cookbook_name "gitlab" recipe_name "gitlab-shell" block # not_if { #code block }  d System Info: ------------ chef_version=17.10.0 platform=ubuntu platform_version=20.04 ruby=ruby 2.7.6p219 (2022-04-12 revision c9c2245c0a) [x86_64-linux] program_name=/opt/gitlab/embedded/bin/cinc-client executable=/opt/gitlab/embedded/bin/cinc-client ================================================================================ Error executing action `create` on resource 'storage_directory[/var/opt/gitlab/.ssh]' ================================================================================ Mixlib::ShellOut::ShellCommandFailed ------------------------------------ ruby_block[directory resource: /var/opt/gitlab/.ssh] (gitlab::gitlab-shell line 36) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1' ---- Begin output of chgrp git /var/opt/gitlab/.ssh ----  STDOUT: STDERR: chgrp: changing group of '/var/opt/gitlab/.ssh': No such file or directory ---- End output of chgrp git /var/opt/gitlab/.ssh ---- Ran chgrp git /var/opt/gitlab/.ssh returned 1 Cookbook Trace: (most recent call first) ---------------------------------------- /opt/gitlab/embedded/cookbooks/cache/cookbooks/package/libraries/storage_directory_helper.rb:35:in `run_command' /opt/gitlab/embedded/cookbooks/cache/cookbooks/package/libraries/storage_directory_helper.rb:52:in `ensure_permissions_set' /opt/gitlab/embedded/cookbooks/cache/cookbooks/package/resources/storage_directory.rb:42:in `block (3 levels) in class_from_file' /opt/gitlab/embedded/cookbooks/cache/cookbooks/package/resources/storage_directory.rb:36:in `block in class_from_file' Resource Declaration: --------------------- # In /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/gitlab-shell.rb 34: storage_directory dir do 35: owner git_user 36: group git_group 37: mode "0700" 38: end  39: end Compiled Resource: ------------------ # Declared in /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/gitlab-shell.rb:34:in `block in from_file' storage_directory("/var/opt/gitlab/.ssh") do action [:create] default_guard_interpreter :default declared_type :storage_directory cookbook_name "gitlab" recipe_name "gitlab-shell" owner "git" group "git" mode "0700" path "/var/opt/gitlab/.ssh" end System Info: ------------ chef_version=17.10.0 platform=ubuntu platform_version=20.04 ruby=ruby 2.7.6p219 (2022-04-12 revision c9c2245c0a) [x86_64-linux] program_name=/opt/gitlab/embedded/bin/cinc-client executable=/opt/gitlab/embedded/bin/cinc-client [2023-10-20T12:20:17-04:00] INFO: Running queued delayed notifications before re-raising exception Running handlers: [2023-10-20T12:20:17-04:00] ERROR: Running exception handlers There was an error running gitlab-ctl reconfigure: storage_directory[/var/opt/gitlab/.ssh] (gitlab::gitlab-shell line 34) had an error: Mixlib::ShellOut::ShellCommandFailed: ruby_block[directory resource: /var/opt/gitlab/.ssh] (gitlab::gitlab-shell line 36) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1' ---- Begin output of chgrp git /var/opt/gitlab/.ssh ---- STDOUT: STDERR: chgrp: changing group of '/var/opt/gitlab/.ssh': No such file or directory ---- End output of chgrp git /var/opt/gitlab/.ssh ---- Ran chgrp git /var/opt/gitlab/.ssh returned 1 Running handlers complete [2023-10-20T12:20:17-04:00] ERROR: Exception handlers complete Infra Phase failed. 3 resources updated in 09 seconds [2023-10-20T12:20:17-04:00] FATAL: Stacktrace dumped to /opt/gitlab/embedded/cookbooks/cache/cinc-stacktrace.out [2023-10-20T12:20:17-04:00] FATAL: --------------------------------------------------------------------------------------- [2023-10-20T12:20:17-04:00] FATAL: PLEASE PROVIDE THE CONTENTS OF THE stacktrace.out FILE (above) IF YOU FILE A BUG REPORT [2023-10-20T12:20:17-04:00] FATAL: --------------------------------------------------------------------------------------- [2023-10-20T12:20:17-04:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: storage_directory[/var/opt/gitlab/.ssh] (gitlab::gitlab-shell line 34) had an error: Mixlib::ShellOut::ShellCommandFailed: ruby_block[directory resource: /var/opt/gitlab/.ssh] (gitlab::gitlab-shell line 36) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1' ---- Begin output of chgrp git /var/opt/gitlab/.ssh ---- STDOUT: STDERR: chgrp: changing group of '/var/opt/gitlab/.ssh': No such file or directory ---- End output of chgrp git /var/opt/gitlab/.ssh ---- Ran chgrp git /var/opt/gitlab/.ssh returned 1 ```

System information:

trapexit commented 8 months ago
494   16:20:17.797566 newfstatat(AT_FDCWD, "/mnt/DISK/NytroWarpDrive/gitlab-ce/data/.ssh", 0x7f8d00b101a0, AT_SYMLINK_NOFOLLOW) = -1 EACCES (Permission denied) <0.000030>
494   16:20:17.797655 newfstatat(AT_FDCWD, "/mnt/DISK/FlashMaxIII/gitlab-ce/data/.ssh", 0x7f8d00b101a0, AT_SYMLINK_NOFOLLOW) = -1 EACCES (Permission denied) <0.000028>
494   16:20:17.797737 newfstatat(AT_FDCWD, "/mnt/SAN/NAS_LUN-1/gitlab-ce/data/.ssh", 0x7f8d00b101a0, AT_SYMLINK_NOFOLLOW) = -1 EACCES (Permission denied) <0.000029>

Because you don't have a strace of the offending app I can't really correlate things properly but this certainly looks like a problem. The underlying filesystems are returning perm denied for that file / path. Have you confirmed perms are properly set?

tarchive commented 8 months ago

Ah I misread that support step and only gave the stacktrace gitlab generated internally. Here is the strace from app command gitlab-ctl reconfigure since I'm not sure which is the most basic command is failing. This is just a snippet from where i see the error begin until the end. Let me know if you need the whole file (90M) snippet.app.strace.txt

I don't think folder permissions are the issue because these are all folders that the container is creating and setting of permissions itself. Their script is here: update-permissions

Further testing if i swap the volumes in my compose to use /mnt/DISK/NytroWarpDrive directly instead of the /mnt/data pool, the container starts up normally without error. Issue only exists when using pool as a mount

version: '3'
services: 
  gitlab-ce:
    container_name: GitLab-CE
    image: 'gitlab/gitlab-ce:15.4.6-ce.0'
    ports:
      - '9081:80'
    volumes:
        - '/mnt/DISK/NytroWarpDrive/gitlab-ce/config:/etc/gitlab'
        - '/mnt/DISK/NytroWarpDrive/gitlab-ce/data:/var/opt/gitlab'
        - '/mnt/DISK/NytroWarpDrive/gitlab-ce/log:/var/log/gitlab'`

Other notes: NytroWarpDrive is the only disk with the gitlab-ce folders. This is a fresh vm install which is why disk sizes are small and empty.

trapexit commented 8 months ago

From mergerfs' perspective the OS is absolutely returning permission denied. It is there in the mergerfs strace as I shared. It stat'ed the .ssh path and all three mounts returned EACCES.

I need both the strace from mergerfs and the strace from the app at the same time so I can correlate what request the app sends with the behavior of mergerfs.

Issue only exists when using pool as a mount

Yes, but perms can be different due to how containers work and how you have mergerfs setup. For instance: many people don't share groups between a container and host leading to a supplemental group difference. Some users use user namespacing which further changes what is going on between a container and the host. You could have perm errors on parts of the path that don't translate to a bind mount. Not everything can be exactly replicated between mergerfs and the underlying filesystem. Hence why I need all the information possible about the setup to comment. Almost certainly the issue is permissions... but I need to know what they are.

trapexit commented 8 months ago

This is a fresh vm install

So what are the perms of the /mnt/DISK/* ? Are you positive they setup properly? mergerfs sees /mnt/DISK/foo... not what you bind mount. It evaluates not the binded point down. It looks at the whole path. If the base of the path is not properly permissioned such that from outside the container it works then it won't work in the container either.

tarchive commented 8 months ago

D'oh! You were right it was a permission issue. I went over the folder permissions earlier before opening this issue but i must have misread the output. Specifically /mnt/DISK had 750 for permission. Once i changed it to 755 and recreated the gitlab container the startup ran smoothly without error. Sorry to waste your time. I swear i checked all this before hand.

Thank you for your ongoing support and overall awesome projects. Long time scorch user, but still a mergerfs newbie.

trapexit commented 8 months ago

no problem. glad we resolved it.

This is a somewhat complicated situation. If I rewrote mergerfs to work more like what a bind mount would do it would 1) require a major rewrite and 2) means that branches can't as easily be added and removed without explicitly configurating mergerfs because it would require holding an open file the life of the usage. It might be worth doing but need to carefully consider all the consequences.