vatesfr / xen-orchestra

The global orchestration solution to manage and backup XCP-ng and XenServer.
https://xen-orchestra.com
Other
770 stars 262 forks source link

Error on backupNg.deleteVmBackup #5532

Closed 48mWsPK closed 3 years ago

48mWsPK commented 3 years ago
backupNg.deleteVmBackup
{
  "id": "078ff9bb-2a5a-4e6c-8018-b539458b3ced/xo-vm-backups/12a437a7-1713-26bb-1908-18c5c92295f3/20210121T231101Z.json"
}
{
  "type": "Error",
  "errno": -13,
  "code": "EACCES",
  "syscall": "open",
  "path": "/run/xo-server/mounts/078ff9bb-2a5a-4e6c-8018-b539458b3ced/xo-vm-backups/12a437a7-1713-26bb-1908-18c5c92295f3/vdis/23b86d51-c791-437c-a310-a5f44c1f3c03/68759324-fe94-4ec2-a611-8eeed950c0b8/20210121T231101Z.vhd",
  "message": "EACCES: permission denied, open '/run/xo-server/mounts/078ff9bb-2a5a-4e6c-8018-b539458b3ced/xo-vm-backups/12a437a7-1713-26bb-1908-18c5c92295f3/vdis/23b86d51-c791-437c-a310-a5f44c1f3c03/68759324-fe94-4ec2-a611-8eeed950c0b8/20210121T231101Z.vhd'",
  "name": "Error",
  "stack": "Error: EACCES: permission denied, open '/run/xo-server/mounts/078ff9bb-2a5a-4e6c-8018-b539458b3ced/xo-vm-backups/12a437a7-1713-26bb-1908-18c5c92295f3/vdis/23b86d51-c791-437c-a310-a5f44c1f3c03/68759324-fe94-4ec2-a611-8eeed950c0b8/20210121T231101Z.vhd'"
}

XO built from sources. Delta backup job hitting the limit for retention, failing to merge the old deltas. Up until this point - no problems reported so far.

Seems related to the issue described here: https://xen-orchestra.com/forum/topic/1100/delta-backups-stuck-at-merge/73

olivierlambert commented 3 years ago

You have a permission issue on your network share. Please check that :)

48mWsPK commented 3 years ago

Got to this after a failed status for said backup job in XO:

{
  "data": {
    "mode": "delta",
    "reportWhen": "failure"
  },
  "id": "1611828935191",
  "jobId": "23b86d51-c791-437c-a310-a5f44c1f3c03",
  "jobName": "2 some server daily",
  "message": "backup",
  "scheduleId": "0b7981c3-9b9b-4a3a-9c8b-dea22a9f8fe9",
  "start": 1611828935191,
  "status": "failure",
  "tasks": [
    {
      "data": {
        "type": "VM",
        "id": "12a437a7-1713-26bb-1908-18c5c92295f3"
      },
      "id": "1611828935195",
      "message": "Starting backup of someserver. (23b86d51-c791-437c-a310-a5f44c1f3c03)",
      "start": 1611828935195,
      "status": "failure",
      "tasks": [
        {
          "id": "1611828935198",
          "message": "snapshot",
          "start": 1611828935198,
          "status": "success",
          "end": 1611828936585,
          "result": "9bcf2842-8373-4bc3-dad1-e3509138ebdf"
        },
        {
          "id": "1611828936589",
          "message": "add metadata to snapshot",
          "start": 1611828936589,
          "status": "success",
          "end": 1611828936600
        },
        {
          "id": "1611828936779",
          "message": "waiting for uptodate snapshot record",
          "start": 1611828936779,
          "status": "success",
          "end": 1611828936986
        },
        {
          "id": "1611828937422",
          "message": "start snapshot export",
          "start": 1611828937422,
          "status": "success",
          "end": 1611828937423
        },
        {
          "data": {
            "id": "078ff9bb-2a5a-4e6c-8018-b539458b3ced",
            "isFull": false,
            "type": "remote"
          },
          "id": "1611828937424",
          "message": "export",
          "start": 1611828937424,
          "status": "failure",
          "tasks": [
            {
              "id": "1611828937509",
              "message": "transfer",
              "start": 1611828937509,
              "status": "success",
              "end": 1611828941473,
              "result": {
                "size": 39878656
              }
            },
            {
              "id": "1611828941486",
              "message": "merge",
              "start": 1611828941486,
              "status": "failure",
              "end": 1611828942186,
              "result": {
                "type": "Error",
                "errno": -13,
                "code": "EACCES",
                "syscall": "open",
                "path": "/run/xo-server/mounts/078ff9bb-2a5a-4e6c-8018-b539458b3ced/xo-vm-backups/12a437a7-1713-26bb-1908-18c5c92295f3/vdis/23b86d51-c791-437c-a310-a5f44c1f3c03/68759324-fe94-4ec2-a611-8eeed950c0b8/20210113T231101Z.vhd",
                "message": "EACCES: permission denied, open '/run/xo-server/mounts/078ff9bb-2a5a-4e6c-8018-b539458b3ced/xo-vm-backups/12a437a7-1713-26bb-1908-18c5c92295f3/vdis/23b86d51-c791-437c-a310-a5f44c1f3c03/68759324-fe94-4ec2-a611-8eeed950c0b8/20210113T231101Z.vhd'",
                "name": "Error",
                "stack": "Error: EACCES: permission denied, open '/run/xo-server/mounts/078ff9bb-2a5a-4e6c-8018-b539458b3ced/xo-vm-backups/12a437a7-1713-26bb-1908-18c5c92295f3/vdis/23b86d51-c791-437c-a310-a5f44c1f3c03/68759324-fe94-4ec2-a611-8eeed950c0b8/20210113T231101Z.vhd'"
              }
            }
          ],
          "end": 1611828942186,
          "result": {
            "type": "Error",
            "errno": -13,
            "code": "EACCES",
            "syscall": "open",
            "path": "/run/xo-server/mounts/078ff9bb-2a5a-4e6c-8018-b539458b3ced/xo-vm-backups/12a437a7-1713-26bb-1908-18c5c92295f3/vdis/23b86d51-c791-437c-a310-a5f44c1f3c03/68759324-fe94-4ec2-a611-8eeed950c0b8/20210113T231101Z.vhd",
            "message": "EACCES: permission denied, open '/run/xo-server/mounts/078ff9bb-2a5a-4e6c-8018-b539458b3ced/xo-vm-backups/12a437a7-1713-26bb-1908-18c5c92295f3/vdis/23b86d51-c791-437c-a310-a5f44c1f3c03/68759324-fe94-4ec2-a611-8eeed950c0b8/20210113T231101Z.vhd'",
            "name": "Error",
            "stack": "Error: EACCES: permission denied, open '/run/xo-server/mounts/078ff9bb-2a5a-4e6c-8018-b539458b3ced/xo-vm-backups/12a437a7-1713-26bb-1908-18c5c92295f3/vdis/23b86d51-c791-437c-a310-a5f44c1f3c03/68759324-fe94-4ec2-a611-8eeed950c0b8/20210113T231101Z.vhd'"
          }
        },
        {
          "id": "1611828942192",
          "message": "set snapshot.other_config[xo:backup:exported]",
          "start": 1611828942192,
          "status": "success",
          "end": 1611828942200
        }
      ],
      "end": 1611828942894
    }
  ],
  "end": 1611828942894
}

Interestingly enough, even though the status is reported as failed, when I check the contents of the folder mentioned in the first post, I can find that the delete backups command actually whent through, because my retention is set to 15, and I only found 10 files.

48mWsPK commented 3 years ago

You have a permission issue on your network share. Please check that :)

Well, testing the remote in XO, Settings -> Remotes says

The remote appears to work correctly

Any other more involved methods to check my remote share? I had problems bringing it up, but I think I have these addressed, as my backups (delta) were working fine, up until this retention limit hit on one of my VMs...

olivierlambert commented 3 years ago

It's hard to say, it depends on the detailed permissions on the share. Also check you are on latest master commit.

48mWsPK commented 3 years ago

Managed to turn the reported failure in to a successful backup by deleting some of the deltas, even though the deletion was reported as failed, it did it.

So now running a backup manually finished good:

{
  "data": {
    "mode": "delta",
    "reportWhen": "failure"
  },
  "id": "1611830656173",
  "jobId": "23b86d51-c791-437c-a310-a5f44c1f3c03",
  "jobName": "2 some server daily",
  "message": "backup",
  "scheduleId": "0b7981c3-9b9b-4a3a-9c8b-dea22a9f8fe9",
  "start": 1611830656173,
  "status": "success",
  "tasks": [
    {
      "data": {
        "type": "VM",
        "id": "12a437a7-1713-26bb-1908-18c5c92295f3"
      },
      "id": "1611830656177",
      "message": "Starting backup of someserver. (23b86d51-c791-437c-a310-a5f44c1f3c03)",
      "start": 1611830656177,
      "status": "success",
      "tasks": [
        {
          "id": "1611830656181",
          "message": "snapshot",
          "start": 1611830656181,
          "status": "success",
          "end": 1611830657678,
          "result": "c19b3c92-006e-81e8-748b-240982efdcab"
        },
        {
          "id": "1611830657684",
          "message": "add metadata to snapshot",
          "start": 1611830657684,
          "status": "success",
          "end": 1611830657697
        },
        {
          "id": "1611830657872",
          "message": "waiting for uptodate snapshot record",
          "start": 1611830657872,
          "status": "success",
          "end": 1611830658078
        },
        {
          "id": "1611830658388",
          "message": "start snapshot export",
          "start": 1611830658388,
          "status": "success",
          "end": 1611830658389
        },
        {
          "data": {
            "id": "078ff9bb-2a5a-4e6c-8018-b539458b3ced",
            "isFull": true,
            "type": "remote"
          },
          "id": "1611830658390",
          "message": "export",
          "start": 1611830658390,
          "status": "success",
          "tasks": [
            {
              "id": "1611830658521",
              "message": "transfer",
              "start": 1611830658521,
              "status": "success",
              "end": 1611830778128,
              "result": {
                "size": 8592054784
              }
            },
            {
              "id": "1611830778134",
              "message": "merge",
              "start": 1611830778134,
              "status": "success",
              "end": 1611830778134,
              "result": {
                "size": 0
              }
            }
          ],
          "end": 1611830778134
        },
        {
          "id": "1611830778136",
          "message": "set snapshot.other_config[xo:backup:exported]",
          "start": 1611830778136,
          "status": "success",
          "end": 1611830778141
        }
      ],
      "end": 1611830778845
    }
  ],
  "end": 1611830778846
}

So this would mean there is some kind of problem with the merging of the old deltas. Any ideas on something I can do to narrow it down? Because, I have other backups nearing their retention limit, I suppose I will hit the same problem with those, too...

48mWsPK commented 3 years ago

It's hard to say, it depends on the detailed permissions on the share. Also check you are on latest master commit.

Just updated to the most recent version, but I was not far behind.

And about the permissions - shouldn't the "Check remote" catch it if there is a problem? Like I said - I had some issues making it work initially, but have it working now...

olivierlambert commented 3 years ago

I'm not sure if the test is checking permissions on subfolders etc. Please take a look on your share directly.

julien-f commented 3 years ago

Indeed, the test simply tries to write/read/delete a file at the root of the remote, it cannot test the permissions of all files.

olivierlambert commented 3 years ago

On that, it might be relevant to create a test folder, then a file inside it, and try to remove the file then the folder.

julien-f commented 3 years ago

On that, it might be relevant to create a test folder, then a file inside it, and try to remove the file then the folder.

Sure we can do it pretty easily, how would that help? What new info would that gave us?

48mWsPK commented 3 years ago

I am with julien-f on this one - I created a share on my NAS - which was an empty folder - and toyed with its permissions until XO was happy with it. All the subfolders in it were created by XO backups, so unless there is some problem with XO - I cannot see how this situation got to be in the first place. I for sure have not gotten in the subfolders structure to play with permissions there.. I simply created my delta backups schedule in XO's web UI, and let XO do its thing..

olivierlambert commented 3 years ago

Error: EACCES is a very low level NodeJS error meaning we can't have write access on some file.

48mWsPK commented 3 years ago

So maybe it needs to be looked at what credentials are being used in that call? Because if the test creates, modifies and deletes a file with the credentials provided in XO remotes config section, we should obviously succeed in deleting a file as per user backups schedule, if we use the correct credentials, right?

julien-f commented 3 years ago

Sorry if I was unclear but I agree with @olivierlambert here, XO is using the same credentials and permissions to handle the backup files as for the testing.

It works for a lot of users so I would think the issue comes from your remote config here.

48mWsPK commented 3 years ago

OK, I will see what I can find on the other end. With using SMB for backups, I suppose I am using a "known-working" config, right? Had no experience with NFS, unlike SMB...

48mWsPK commented 3 years ago

I have this in my NAS logs:

smbd:  ../../source3/smbd/notify_inotify.c:369(watch_destructor)
smbd: inotify_rm_watch returned Invalid argument

Does this help in any way?

olivierlambert commented 3 years ago

What is your NAS model? Is it up to date?

48mWsPK commented 3 years ago

XigmaNAS, updated yeasturday to the latest build. Running embedded edition on an old computer.

olivierlambert commented 3 years ago

We are not SMB expert neither, so if you don't know how to investigate further on the SMB permission issue, feel free to try in NFS and report if the outcome changes :)

48mWsPK commented 3 years ago

Well, it seems smbd does not like FreeBSD's kernel fsnotify implentation. What I did is add a single line to my smbd config:

notify:inotify = false

As a result, merging of deltas now seems to be working fine, at least with my small VM that hit the issue in the first place.

This proved helpfull: https://www.mail-archive.com/samba@lists.samba.org/msg98083.html For the record, this was using XigmaNAS 12.2.0.4 - Ornithopter (revision 8030) embedded. And besides this issue - really happy with XigmaNAS - even now it proves agile enough to fix things with a simple config change.

Thanks, everyone! Your comments proved helpfull!