Closed fulalas closed 1 year ago
Hello fulalas,
fulalas:
I noticed that since kernel 6.3.0 during the shutdown/reboot process the system can no longer unmount
/union
(on Porteus 5, which mounts in the boot the whole system using aufs). By reverting the kernel to any 6.2.x it can unmount/union
properly. This has been confirmed by many users.
It means someone is using aufs, and it makes aufs busy (in use). So let's find out who is using what file in aufs. The first approach is "sudo lsof" which you might already tried.
So I'd suggest you to try a module parameter 'debug=1' and MagicSysrq key. The module parameter may not be so helpful for this case. But I hope MagicSysrq would help. By default, MagicSysrq + 'A' dumps all files aufs is using, and I hope it helps you to find out the process (and the file).
a'. . .TP .B debug= 0 | 1 Specifies disable(0) or enable(1) debug print in aufs. This parameter can be changed dynamically. You need to enable CONFIG_AUFS_DEBUG. Currently this is for developers only. The default is
0' (disable).J. R. Okajima
Hello @fulalas and @sfjro from the Puppy Linux community. I'm not sure how many Puppy users are using the 6.3 kernel with aufs but there are certainly a few (3+ including me) and so far I haven't seen any problem reports. I'll PM this thread to the users I know about. We can use the Porteus kernel in Puppy so if I get time I'll see if that shows any problems. Regards PeaBee
Seems OK - no error messages on shutdown......
@sfjro, thanks for the tips. Here's what I have so far:
kernel 6.3.9 (just before unmounting everything to reboot/shutdown) + nvidia driver:
And heaps more of screens like the ones above until this last one:
Now with kernel 6.2.11 + nvidia driver:
As you can see, for some reason kernel 6.3.x locks the whole system so the system can't unmount /union, while kernel 6.2.x doesn't.
As said, what's funny is that I can easily rm -fr /union/*
(after unmounting its children, of course) and no single file is left, still I can't unmount it, and calling MagicSysrq (alt+print+a) prints the same long list of files in use.
In case you're wondering why I'm specifying '+ nvidia driver', that's because on my system if I don't use nvidia driver in 6.3.x it works perfectly:
Oh, in all scenarios lsof
returns the same result:
We could blame nvidia but people on Porteus forum without nvidia card reported failure during /union unmount.
Question: when you updated your aufs project from 6.2.x to 6.3.x have you changed any part of the code apart from some adaptation to make it work on 6.3.x?
What's clear is that something has changed since 6.3.x. If it's the kernel or aufs or the combination of both or some mystical explanation, I don't know :D
Thanks once again!
PorteuX:
@sfjro, thanks for the tips. Here's what I have so far:
kernel 6.3.9 (just before unmounting everything to reboot/shutdown) + nvidia driver: :::
It shows that you have still many files opened in aufs.
Now with kernel 6.2.11 + nvidia driver:
Your last two lines are aufs: files aufs: done which means your files are all closed and nothing is in use.
As said, what's funny is that I can easily
rm -fr /union/*
(after unmounting its children, of course) and no single file is left, still I can't unmount it, and calling MagicSysrq (alt+print+a) prints the same long list of files in use.
Yes, removing the files never mean closing. The files are kept being in use.
In case you're wondering why I'm specifying '+ nvidia driver', that's because on my system if I don't use nvidia driver in 6.3.x it works perfectly:
That is a mystery for me because I don't know what 'nvidia driver' does.
Question: when you updated your aufs project from 6.2.x to 6.3.x have you changed any part of the code apart from some adaptation to make it work on 6.3.x?
Essentially I made only two commits.
8a331ddd0 2023-03-07 aufs: for v6.3-rc1, new header filelock.h 322e81b5f 2023-03-07 aufs: for v6.3-rc1, mnt_user_ns() is replaced by mnt_idmap()
You can see them by running 'git log aufs6.2..aufs6.3".
What's clear is that something has changed since 6.3.x. If it's the kernel or aufs or the combination of both or some mystical explanation, I don't know :D
Unfortunately I cannot dive into aufs now coz my ssd got broken, and I'm still struggling restoring my environment. One possible scenario is aufs has a bug around the file reference count, which is incremented when opened and decremented when closed. And when the reference counter reaches zero, then the file becomes not-in-use status. Your magic sysrq shows many files are still opened from aufs' point of view.
J. R. Okajima
@sfjro, thank you for the support.
No rush. Whenever you have time to look at this it will be great. If we can help you with your environment, let us know. :)
I'm wondering, if this is a bug in aufs why it didn't show up in 6.2.x and lower versions? I looked at your 2 commits and they look OK and small, although I'm not an expert in anything related to kernel.
Not sure if this helps, but the mount command used to load .xzm modules into the union is mount -no remount,add:1:"$MOD"=rr aufs /
, where $MOD
is pointing to the loop image previously mounted with mount -no loop,ro "$targetmod" "$MOD"
, where $targetmod
is the module file path.
Also, not sure if this is related, but since util-linux 2.39 I can no longer mount .xzm modules into the union: https://github.com/util-linux/util-linux/issues/2309#issuecomment-1612771116
as far as i can see, this is to do with the util-linux mount regression. as seen in the util-linux#2309 issue the mount command : mount -no remount,add:1:"$MOD"=rr aufs /
needs to be mount -no remount,add=1:"$MOD"=rr aufs /
...
similarly, for the removal: mount -t aufs -o remount,del:$MOD aufs
should be mount -t aufs -o remount,del=$MOD aufs
.. kernel version doesn't seem to matter.
in my skimming of the manual(possibly not even the right version) the syntax/usage of remount,add/del seems a bit vague, perhaps some expansion could be useful in that area.
EDIT: ok :p on closer inspection, there's something else going on here causing 'unclean' unmounting on shutdown, with the newer kernels >6.3
Hello,
ncmprhnsbl:
as far as i can see, this is to do with the util-linux mount regression. as seen in the util-linux#2309 the mount command :
mount -no remount,add:1:"$MOD"=rr aufs /
needs to bemount -no remount,add=1:"$MOD"=rr aufs / ... similarly, for the removal:
mount -t aufs -o remount,del:$MOD aufsshould be
mount -t aufs -o remount,del=$MOD aufs`
.. kernel version doesn't seem to matter.
You're right. For aufs, the change in util-linux is a regression. But I can understand the change and I won't ask util-linux to handle aufs differently.
The kernel version is problem in my environment only. The old kernel cannot be compiled by new compiler. This is a matter of my development environment.
in my skimming of the manual(possibly not even the right version) the syntax/usage of remount,add/del seems a bit vague, perhaps some expansion could be useful in that area.
Hmm, there are a few examples in EXAMPLE section. But I will add one or two.
J. R. Okajima
Hello @sfjro Any news?
Regards Blaze
Hello,
TurboBlaze:
Hello @sfjro Any news?
For fulalas' original problem,aufs cannot be unmounted because of EBUSY, there is no progress on my side.
For ncmprhnsbl's post, util-linux(libmount)'s change, I made a workaround in mount.aufs(8) helper. And I am going to test now. But my development environment still suffers from my ssd damage. I'm still working, but the pace is very slow.
Here is the patch I wrote to follow the libmount's change.
J. R. Okajima
commit 0d80114bf831a7ac6fe9b9f8adc8657349c15b9c Author: J. R. Okajima @.***> Date: Fri Jul 21 09:27:01 2023 +0900
workaround for fsctx in util-linux 2.39
util-linux (libmount) v2.39 issues fsmount(2) families and it rejects
the aufs mount option using the "colon" syntax such as "br:rw:ro" and
"del:rw".
In order to make it keep working, mount.aufs(8) helper translates the
colon to equal sign, such like "br=rw:ro" and "del=rw".
This tranlation should always work regardless the version of
util-linux.
See-also: ***@***.***/msg05912.html
Reported-by: Thomas Wei schuh ***@***.***>
Signed-off-by: J. R. Okajima ***@***.***>
diff --git a/aufs.in.5 b/aufs.in.5 index 1e112c9..ace1973 100644 --- a/aufs.in.5 +++ b/aufs.in.5 @@ -55,6 +55,8 @@ whplink-dir(*[AUFS_WH_PLINKDIR]) if necessary . .TP .B br:BRANCH[:BRANCH ...] (dirs=BRANCH[:BRANCH ...]) +.TQ +.B br=BRANCH[:BRANCH ...] Adds new branches. (cf. Branch Syntax).
@@ -70,9 +72,16 @@ work correctly. By default (since linux-3.2 until linux-3.18-rc1), aufs
prohibits such operation internally,
but there left a way to do.
(cf. Branch Syntax).
+
+If you use mount(8) from util-linux v2.39 and later, you cannot use
+the colon (br:) and you have to use the equal sign (br=) instead.
+But if you install aufs-util release 20230724 (or later), you can use
+the colon too.
.
.TP
.B [ add | ins ]:index:BRANCH
+.TQ
+.B [ add | ins ]=index:BRANCH
Adds a new branch.
The index begins with 0.
Aufs creates
@@ -95,9 +104,14 @@ If you want to update the contents of a process address space after
adding, you need to restart your process or open/mmap the file again.
.\" Usually, such files are executables or shared libraries.
(cf. Branch Syntax).
+
+If you want to use the colon (add:), then you need to install
+aufs-util release 20230724 or later.
.
.TP
.B del:dir
+.TQ
+.B del=dir
Removes a branch.
Aufs does not remove
whiteout-base(*[AUFS_WH_BASE]) and
@@ -109,9 +123,14 @@ If a process is referencing the file/directory on the deleting branch
(by open, mmap, current working directory, etc.), aufs will return an
error EBUSY. In this case, a script aubusy' (in aufs\-util.git and aufs2\-util.git) is useful to identify which process (and which file) makes the branch busy. + +If you want to use the colon (del:), then you need to install +aufs\-util release 20230724 or later. . .TP .B mod:BRANCH +.TQ +.B mod=BRANCH Modifies the permission flags of the branch. Aufs creates or removes whiteout\-base(\*[AUFS_WH_BASE]) and/or @@ -127,14 +146,21 @@ Additionally when you enable CONFIG_IMA (in linux\-2.6.30 and later), IMA may produce some wrong messages. But this is equivalent when the filesystem is changed
ro' in emergency.
(cf. Branch Syntax).
+
+If you want to use the colon (mod:), then you need to install
+aufs-util release 20230724 or later.
.
.TP
.B append:BRANCH
+.TQ
+.B append:BRANCH
equivalent to add:(last index + 1):BRANCH'. (cf. Branch Syntax). . .TP .B prepend:BRANCH +.TQ +.B prepend=BRANCH equivalent to
add:0:BRANCH.'
(cf. Branch Syntax).
.
diff --git a/mount.aufs.c b/mount.aufs.c
index e085d4c..515be81 100644
--- a/mount.aufs.c
+++ b/mount.aufs.c
@@ -219,6 +219,45 @@ static int drop_level(int argc, char **argv, int idx)
return 0;
}
+/*
static void do_mount(char dev, char mntpnt, int argc, char argv[], unsigned char flags[]) { @@ -242,8 +281,10 @@ static void do_mount(char dev, char mntpnt, int argc, char argv[], for (i = 3; i < argc; i++) if (strcmp(argv[i], "-f") && strcmp(argv[i], "-n")
Hi,
fulalas:
No rush. Whenever you have time to look at this it would be great. If we could help you with your environment, let us know. :)
I've reviewed aufs6.3 and found a suspicious code about the file refenrece count. Please try this patch and see if you can unmouont aufs cleanly. But I am not sure this is the cause of your problem. And the line number in the patch may be different from your source file. Please apply manually, if it is.
J. R. Okajima
diff --git a/mm/mmap.c b/mm/mmap.c index 61a4bede666e..8ff923ccfe2b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2863,21 +2865,21 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, if (vma->vm_flags & VM_LOCKED) flags |= MAP_LOCKED;
vma_get_file(vma);
file = vma->vm_file;
prfile = vma->vm_prfile;
ret = do_mmap(vma->vm_file, start, size,
prot, flags, pgoff, &populate, NULL);
if (!IS_ERR_VALUE(ret) /&& file /&& prfile) { struct vm_area_struct *new_vma;
new_vma = find_vma(mm, ret);
if (!new_vma->vm_prfile)
new_vma->vm_prfile = prfile;
@sfjro, it seems your patch fixed the issue, yes! I didn't have time to make solid tests though. I'll let you know.
Thank you once again for the hard work!
fulalas:
@sfjro, it seems your patch fixed the issue, yes! I didn't have time to make solid tests though. I'll let you know.
Thanx for the test. I'll merge and release the patch next Monday.
J. R. Okajima
@sfjro, I did more tests and unfortunately your patch doesn't fix the issue in all scenarios -- in my case, it fails using Nvidia drivers.
Other people in Porteus forum also reported your patch didn't fix the issue for them.
So I guess we need to continue investigating.
Thanks!
fulalas:
@sfjro, I did more tests and unfortunately your patch doesn't fix the issue in all scenarios -- in my case, it fails using Nvidia drivers.
Other people in Porteus forum also reported your patch didn't fix the issue for them.
Thanx for the report. I will try more.
J. R. Okajima
fulalas:
So I guess we need to continue investigating.
Please try this one-liner patch. I'm still testing.
J. R. Okajima
diff --git a/mm/mmap.c b/mm/mmap.c index 90ab9002f976..1e286c19f9c9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2639,7 +2639,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
"J. R. Okajima":
Please try this one-liner patch. I'm still testing.
My local test is not bad, but it is doubtful that the patch will solve the problem.
Now I'm considering removing aufs[56]-mmap.patch entirely. The purpose of aufs[56]-mmap.patch is to show the correct path in /proc/PID/maps and the symlink target of /proc/PID/fd/N. It was necessary for some applications to work correctly. For instance, debian apt-get(1), if I remember correctly. And probably lsof(1) wants the correct path too.
If users and their applications, I mean the use-cases, allow the incorrect path, aufs[56]-mmap.patch can be removed.
J. R. Okajima
My local test is not bad, but it is doubtful that the patch will solve the problem.
@sfjro, your guess is correct: this last patch unfortunately didn't fix the issue.
Now I'm considering removing aufs[56]-mmap.patch entirely.
I'm happy to test your last idea. How should I proceed? :)
fulalas:
Now I'm considering removing aufs[56]-mmap.patch entirely.
I'm happy to test your last idea. How should I proceed? :)
Revert the patch by running $ for i in the_last_two_patches_I_sent
do patch -p1 -R < $i done $ patch -p1 -R < aufs6-standalone.git/aufs6-mmap.patch and then rebuild your kernel.
J. R. Okajima
I thought you were saying that we could remove all patches made to mmap.c
file. Well, I tried but the system didn't work properly -- I could not even load the GUI.
fulalas:
I thought you were saying that we could remove all patches made to
mmap.c
file. Well, I tried but the system didn't work properly -- I could not even load the GUI.
Arg, I forgot to mention this one small patch.
J. R. Okajima
diff --git a/fs/aufs/file.h b/fs/aufs/file.h index 4ed41bb59d3d..7d2be2d7f619 100644 --- a/fs/aufs/file.h +++ b/fs/aufs/file.h @@ -317,12 +317,14 @@ static inline void au_vm_file_reset(struct vm_area_struct vma, static inline void au_vm_prfile_set(struct vm_area_struct vma, struct file *file) { +#if 0 get_file(file); vma->vm_prfile = file;
get_file(file);
vma->vm_region->vm_prfile = file;
+#endif }
fulalas:
I thought you were saying that we could remove all patches made to
mmap.c
file. Well, I tried but the system didn't work properly -- I could not even load the GUI.
Not only mm/mmap.c. aufs6-mmap.patch modifies several files.
If it is not easy for you to revert aufs6-mmap.patch, then I'd suggest you to try the approach to keep aufs6-mmap.patch.
J. R. Okajima
diff --git a/mm/mmap.c b/mm/mmap.c index a042cf64c9f0..90ab9002f976 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -593,7 +593,7 @@ static inline void vma_complete(struct vma_prepare *vp, if (vp->file) { uprobe_munmap(vp->remove, vp->remove->vm_start, vp->remove->vm_end);
@sfjro, I'm always patching and building from scratch, so I'm always applying the extra patches manually. Would it be easier to have a special branch for testing so I can just pull from it? :)
BTW, this patch (the one before the last) prevents the kernel to be built.
OK, good news! It seems it's fixed now! I applied all the patches in this thread, except this (because it breaks the build process), and the system is finally able to unmount everything now!
Gonna wait for more people from Porteus forum to confirm.
Nice work, man! Thanks a lot!
With @fulalas kernel I don't have anymore issue with unmount union at shutdowns/reboots in Porteus. https://i.imgur.com/Of7n4HB.png Thanks to @sfjro aka Junjiro Okajima for your hard work!
P.S. waiting a new aufs patch ;)
TurboBlaze:
With @fulalas kernel I don't have anymore issue with unmount union at shutdowns/reboots.
Guys, thank you for the tests. The patch (one-liner) will be merged in the release on Monday (14 Aug).
J. R. Okajima
------- Blind-Carbon-Copy
From: "J. R. Okajima" @.> To: @. Subject: aufs5 and aufs6 GIT release MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: @.> Date: Mon, 14 Aug 2023 01:45:43 +0900 Message-ID: @.>
o bugfix
o misc
(aufs-util.git)
J. R. Okajima
------- End of Blind-Carbon-Copy
@sfjro, do you have any plans to create a new branch for 6.4.x and 6.5.x?
Thanks!
fulalas:
@sfjro, do you have any plans to create a new branch for 6.4.x and 6.5.x?
Of course. Here is my plan.
J. R. Okajima
Not bad to see aufs support for kernel 6.5.x
As an experiment..... tried to apply aufs6.x-rcN to 6.5-rc7 .......... One failure to patch fs/splice.c from aufs6-base.patch patch -N -p1 < aufs6-base.patch patching file fs/splice.c Hunk #1 succeeded at 928 (offset 63 lines). Hunk #2 FAILED at 876. 1 out of 2 hunks FAILED -- saving rejects to file fs/splice.c.rej
code doesn't seem to exist anymore.... splice.c.txt attached splice.c.txt
@@ -876,9 +876,9 @@ static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
/*
* Attempt to initiate a splice from a file to a pipe.
*/
-static long do_splice_to(struct file *in, loff_t *ppos,
- struct pipe_inode_info *pipe, size_t len,
- unsigned int flags)
+long do_splice_to(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
{
unsigned int p_space;
int ret;
6.5 has been released.... @sfjro would it be possible to say what changes you think will be needed to aufs6-base.patch for splice.c in 6.5? Many thanks
PB:
@sfjro would it be possible to say what changes you think will be needed to aufs6-base.patch for splice.c in 6.5?
I am not reached to v6.5 yet. Wait a week or two plz.
J. R. Okajima
Many thanks for aufs-6.4 and aufs-6.5
Kernel 6.5.1 built with aufs-6.5 and seems fine.... :-))
@sfjro, thanks a looot for the hard work. All recent branches are working flawlessly, including not only 6.3.x but also 6.4.x and 6.5.x. You're a hero! :)
I'm closing this issue.
fulalas:
@sfjro, thanks a looot for the hard work. All recent branches are working flawlessly, including not only 6.3.x but also 6.4.x and 6.5.x. You're a hero! :)
Haha, glad to hear that. Thank you.
J. R. Okajima
Hi!
I noticed that since kernel 6.3.0 during the shutdown/reboot process the system can no longer unmount
/union
(on Porteus 5, which mounts in the boot the whole system using aufs). By reverting the kernel to any 6.2.x it can unmount/union
properly. This has been confirmed by many users.Now, I don't know if this is an aufs issue or a kernel (upstream) issue. What I know is that using Porteus 5 with OverlayFS (instead of aufs) this issue doesn't happen.
These are the commands Porteus uses to unmount everything in the shutdown/reboot process:
The last (third) command is the one that fails in 6.3.x with the message
Device or resource busy
, while in 6.2.x it unmounts without any complains.It's funny because using 6.3.x I can
cd /union
, executerm -fr *
and it removes everything just fine, but/union
itself is locked and can't be removed or unmounted.Any ideas?
Thanks once again for the hard work! :)