tcsh-org / tcsh

This is a read-only mirror of the tcsh code repository.
https://www.tcsh.org/
Other
229 stars 42 forks source link

[PATCH] Fix nice test for several architectures #94

Closed josefs10 closed 4 months ago

josefs10 commented 4 months ago

For several architectures such as loong64, the 'nice' test fails because of permissions issues. This patch will test the 'nice' command in 2 ways and works for all architectures. Original debian bug is here. priority-values-nice-test.patch.txt

suominen commented 4 months ago

Running nice without a priority is the same as nice +4. I don't see how nice +19 addresses "permissions issues" over nice +4 or just nice. Can you elaborate on that, please?

josefs10 commented 4 months ago

It's only testing the + and - values for nice to address permissions differences. The values were chosen to be more extreme values. You could use nice +4 instead of nice +19 and get the same result.

suominen commented 4 months ago

I'm not opposed to adding more testing for nice, but I don't understand why the original test should be dropped. The title of this issue suggests that the original test failed "for several architectures." How did it fail and why?

josefs10 commented 4 months ago

Here is the failed test from the loong64 debian buildd log before this patch:

## ---------------------- ##
## Detailed failed tests. ##
## ---------------------- ##

#                             -*- compilation -*-
68. commands.at:1041: testing nice ...
./commands.at:1045: tcsh -f -c 'nice set var=1; echo $?var'
--- /dev/null   2024-02-28 13:18:15.000000000 +0000
+++ /<<PKGBUILDDIR>>/testsuite.dir/at-groups/68/stderr  2024-03-01 05:44:44.991239491 +0000
@@ -0,0 +1 @@
+setpriority: Permission denied.
68. commands.at:1041: 68. nice (commands.at:1041): FAILED (commands.at:1045)

A couple other less common architectures like alpha also failed this way. We are not exactly sure why these architectures failed and others passed. Perhaps the different architecture build environments use different default nice values? This was a strange bug, and that's why we explicitly defined the + and - values in the new tests.

thep commented 4 months ago

Hi. I tested this with @josefs10. So, let me add some input.

My hypothesis is that the build process can be run with a specified nice priority, which can be higher than the default +4 when omitted, making the plain 'nice' command fail with the permission. So, to be safe, I proposed to use the extreme value +19 in the test, to make sure it's always permitted, regardless of the current nice priority of the build process.

suominen commented 4 months ago

From reading setpriority(2) prio is an absolute value. It looks like it is following setpriority -> sys_setpriority -> donice -> sched_nice in the NetBSD kernel, although I'm not 100% sure I'm reading the code right (especially when we get to resetpriority).

From reading nice(3) it is not quite clear if incr is an absolute value or not, but as its name hints, it is a relative value.

This means that the nice command in tcsh sets an absolute value on systems with setpriority(2), and a relative value otherwise.

The error message above clearly identifies that setpriority was called, so it is odd that it could not set the value at +4.

And even if the value were relative, it seems odd that a process could not make its priority lower ("worse"), but again, I'm not sure what the interface standard for nice requires.

I would really like to keep testing the nice command in tcsh without an argument to make sure it is parsed correctly. As I mentioned before, I don't mind adding more test cases for the forms that provide explicit values.

I think we should get more information about when this error is triggered.

Maybe it is just an unusual setup for compiling? Running bulk compilation at a negative priority of -5 or higher does seem unusual, does it not?

Or is the resulting tcsh somehow not defaulting to +4 in the problem setup?

josefs10 commented 4 months ago

That's interesting and thanks for looking into this! I can do some more testing with setting the nice argument to see if the value actually does affect the nice test and seeing the setpriority error.

A value of 19 or 20 will schedule a process only when nothing at priority <= 0 is runnable.

@thep I'm not familiar with default priority values of background processes but thought this might have the ability to affect our tests on some machines (although the +19 test passed in my testing).

thep commented 4 months ago

This means that the nice command in tcsh sets an absolute value on systems with setpriority(2), and a relative value otherwise.

Thanks for deep check. So, I wonder how the traditional csh behaves on NetBSD. Assuming it calls nice(3) the same way, wouldn't it contradict what described in csh(1) for the nice builtin command?

"...Sets the scheduling priority for the shell to number, or, without number, to 4..."

And what about our tcsh on NetBSD?

Do we have to ensure that the actual behavior matches what described in the man page?

The error message above clearly identifies that setpriority was called, so it is odd that it could not set the value at +4.

And even if the value were relative, it seems odd that a process could not make its priority lower ("worse"), but again, I'm not sure what the interface standard for nice requires.

I think the error would not happen on systems with relative nice. As you observed, it's caused by setpriority() call.

I would really like to keep testing the nice command in tcsh without an argument to make sure it is parsed correctly. As I mentioned before, I don't mind adding more test cases for the forms that provide explicit values.

I agreed with the goal. Just couldn't think of a practical way to maintain that. Let's try it anyway.

I think we should get more information about when this error is triggered.

I hope this can be some hint:

$ nice -n 10 tcsh
tcsh> nice echo hello
setpriority: Permission denied.
tcsh> nice +10 echo hello
hello
tcsh> exit
$

Maybe it is just an unusual setup for compiling? Running bulk compilation at a negative priority of -5 or higher does seem unusual, does it not?

You mean a priority +5 or higher? It's possible on a shared machine used as a build pool for several packages in a distro. See buildd.conf example for the initially-commented-out $nice_level value of 10.

Or is the resulting tcsh somehow not defaulting to +4 in the problem setup?

This seems unlikely, albeit still possible.

suominen commented 4 months ago

@thep : Thanks for the clarification — I get it now. (The higher/lower +/- stuff does not lend itself for clarity too easily...)

I'm beginning to think that maybe we should call getpriority(2) to obtain the current priority and treat the argument as relative instead of absolute. Let me think about that a bit.

Not surprisingly nice on csh behaves just like on tcsh.

cc: @zoulasc

suominen commented 4 months ago

This issue has been addressed by making nice priority incremental (https://github.com/tcsh-org/tcsh/commit/a400eaad81a693c7151ea15994659dd2ff752139).