openwebwork / pg

Problem rendering engine for WeBWorK
http://webwork.maa.org/wiki/Category:Authors
Other
46 stars 76 forks source link

Definition of factorial in PG can lead to what WW sees as an infinite loop and causes CPU load which may sometimes persist until a restart or manual intervention #675

Closed taniwallach closed 2 years ago

taniwallach commented 2 years ago

I ran into a problem where my server was showing very high CPU usage for a long time. It seems to have resulted from an Apache process which got into an overly long calculation while processing a problem render request and for some reason the system failed to kill that process.

The root cause seems to be that the definition of factorial has no upper bound on the (non-negative) integer whose factorial it will attempt to calculate in a while loop. This issue currently allows intentional or accidental denial-of-service attacks against a webwork server (particularly those with a small number of CPU cores), and I think we should try to prevent that possibility for the future.

Entering 171! in an answer which is expecting a Formula gave me the response Can't convert an Infinity to a constant (as does n! for larger values of n) but does not cause an error where a Real is expected. 170! does not trigger that message. That seems to indicate that n=170 is essentially the largest value for which Perl is obtaining a (finite) floating point number as the result of the calculation, so there is no reason to run the calculation for values over 171.

Where a Real is expected:

I propose that we change the _eval method in pg/lib/Parser/UOP/factorial.pm to avoid overly long calculations by making some change before the while loop. However, I am not sure what the best approach is. Some options I can think of are:

@drgrice1 @dpvc @pstaabp @mgage - What do you think is the best approach to take?


When investigating the problem on my server I found quite a few error.log records reporting Timeout after processing this problem for 60 seconds. Check for infinite loops in problem source. Those I could track down were for the problem Library/Berkeley/StewCalcET7e/17.4/17-4-05.pg and for students whose last_answer from the course_problem_user table included a double factorial.

Ex (for a user who had seed 3944)

+--------------------------------------------------------------------------------------------------------------------------+
| last_answer                                                                                                              |
+--------------------------------------------------------------------------------------------------------------------------+
| ["AnSwEr0001","((2^(n))/(2n!!))","MaThQuIlL_AnSwEr0001","\\frac{2^n}{2n!!}","AnSwEr0002","0","MaThQuIlL_AnSwEr0002","0"] |
+--------------------------------------------------------------------------------------------------------------------------+

Trying to view the problem as that user would eventually work, and show the warning about the timeout. However, it cased one CPU core to run at 100% for about a minute, after which the "runaway" Apache process got killed.

The problem loaded without delay after I deleted the last_answer directly in the database.

Hardcopy generation (apparently for the same users) also ran into problems.

This seems to indicate that the last_answer is being processed during a request to view the problem, even without a submission.

dlglin commented 2 years ago

It seems like for your specific case the problem stems from students using n!! to mean n(n-2)(n-4)..., whereas WW is interpreting it as (n!)!. Putting limits on factorial is a good idea, but perhaps MathObjects should either interpret n!! as n(n-2)(n-4)... or not accept it at all. This would prevent unnecessary factorial calculations when they weren't intended.

Also, since factorials will only work for numbers up to 170, would it make sense to replace the while loop with a lookup table? That would save a lot of unnecessary processor usage.

Alex-Jordan commented 2 years ago

I like Danny's idea to make !! a new unary operator that has the conventional double factorial meaning.

Googling says that perl's maximum value for a float is 1.797693e+308, and 170! and 171! surround this number.

The recursive loop for factorial is good when it produces an exact value integer result. But precision is lost at some point when the number is large enough. The smallest factorial with actually 16~17 digits is 18!. If you go up to 21!, it has 16 digits followed by 0s, so it will be a float but not exactly off since the trailing digits are 0s. (.) But after that, looping seems wasteful.

An alternative to a lookup table is once you are greater than 21!, maybe just use something like Stirling approximation, since you can't keep precision anyway? I'm curious what the error messages would be like if you used a version of Stirling with n ≥ 171.

taniwallach commented 2 years ago

@dlglin and @Alex-Jordan - Thanks!

Alex - I'm not sure we would want to use an approximation for n! (for values of n between about 20 and 50) which is less accurate the best Perl can calculate. Any additional inaccuracy in the floating point approximation of n! will increase the inaccuracy of the overall numerical estimate of whatever value is being calculated.

I tend to suspect that many problems where factorial is "needed" are where terms using something like n! are expected in an answer, and where the grading code probably already limits the values of n used during testing to avoid very large values.

I suspect that typical use only require calculating n! for any n up to around say 30 or even 40. Maybe I'm wrong.

As such, CPU intensive calculations are probably most likely to occur for student input of things like 20!!, (20!)!, or 500000000!. It is things like that where a cap at n=170 would prevent unwanted and excessive CPU load from being permitted.

It seems like for your specific case the problem stems from students using n!! to mean n(n-2)(n-4)..., whereas WW is interpreting it as (n!)!. ... but perhaps MathObjects should either interpret n!! as n(n-2)(n-4)... or not accept it at all. This would prevent unnecessary factorial calculations when they weren't intended.

Yes, that is probably what the students intended, but it is not a syntax WW currently understands. I'm less worried about the question of whether WW should understand that notation than avoiding the loop being run for very large values of n.

I'm pretty sure I recall the formatted answer display showing (n!)! or something similar, which can help students realize that WW does not understand the special !! notations.

I don't know how feasible it would be to "teach" MathObjects !! as a different symbol or to reject it with a warning.

If it is not difficult to implement, it would certainly be nice (but also with a built in upper limit on what values of n will get calculated/available via a lookup table).

Putting limits on factorial is a good idea, ... Also, since factorials will only work for numbers up to 170, would it make sense to replace the while loop with a lookup table? That would save a lot of unnecessary processor usage.

Thanks. We just need to decide on the best approach to take, and how best to handle n > 170.

Since no one complained about the CPU load issue before, I suspect that the most important thing is to cap the calculation at n=170, which would prevent excessively long calculations from being attempted by the system.

I think that deciding on the optimal approach for n <= 170 is less urgent from a server load perspective. A run of the loop to calculate 170! in a single Perl script took about 0.05 seconds on my laptop (including the time to startup and stop the Perl interpreter) not that huge a CPU drain in the context of parsing a typical webwork problem, and to the best of my knowledge there are not any frequent complaints about factorial calculations causing frequent server performance issues.

The benefit of a fixed lookup table probably is a question of loading/parsing time vs. calculation time. For infrequent use, it could be that the need to read the data in from disk and pull out the values out of the lookup table might not be significantly more efficient than just running the loop. It probably depends on how often factorial gets used in practice by a "typical" webwork process, and to what extent the lookup data was cached (in memory) by mod_perl / WW.

The bigger issue is that questions using factorial in the answers probably evaluate n! at several values which recalculate values which had been calculated before, which could be done more efficiently if previously known values were "saved" for reuse. So long as only "small" values of n are used, even the inefficient approach is probably not that noticeable a drag on processing time. However, even if the saved values of n! for smaller values of n was retained even just in a given Apache process, that could still improve efficiency if the process was in use long enough to hit many calls which required calculations of n! for several values of n.

Note: If we do want to use a lookup table, we probably would want such a table to retain the "maximum accuracy" of Perl's internal floating point values (where they are needed) and exact integer values where that suffices, to minimize any additional inaccuracy in the calculations in which the factorial term is used beyond what is necessary.

Potentially we could hard-code the data in a file, using some sort of binary dump / load of Perl's internal format of a float. See for example the section on floats at https://perldoc.perl.org/perlpacktut .

Alternately, the implementation could calculate the values once, filling up an internal lookup table with values needed "so far" and remembering the largest value calculated so far, and filling in more values only when they are needed. If the collected values were cached at least at the level of a given Apache/Perl process, it should probably provide some reasonable performance improvement.


@Alex-Jordan suggested:

maybe just use something like Stirling approximation, since you can't keep precision anyway? I'm curious what the error messages would be like if you used a version of Stirling with n ≥ 171.

For n >= 171 the correct value of n! is simply beyond what Perl's internal floating format can handle, so would just end up as Perl's inf. As such, I don't think there is any benefit to a different approach for calculating/approximating such values. Numeric usage of such factorials will just "break". For all reasonable purposes in "grading" calculations, PG probably needs to just treat such values as infinite, and it is probably best to avoid any question where grading code would be expected to evaluate n! for such large values of n.

We could document the recommendation to avoid grading code which would intentionally evaluate n! for values of n where problems are likely to occur (such as n > 170, but probably for even smaller values where the lack of precision in the calculations renders the values quite inaccurate).

Alex-Jordan commented 2 years ago

A local perl script I wrote consistently takes less time with a subroutine that uses a preconstructed hash than with a recursive subroutine. It's about 10 times faster for 170!. Even with 0!, where the loop subroutine immediately halts (but has to do a conditional test first), it takes over twice as long than just looking up $factoral{0}.

Alex-Jordan commented 2 years ago

I take it back about 0!. I think earlier I was not watching the order of magnitude closely. A local perl script handles return Inf; OK. That's not to say it would work the same way in WeBWorK.

Code:

use Time::HiRes 'gettimeofday', 'tv_interval';
use Math::Trig;

sub fact {
 my $n = shift;
 return 1 if ($n == 0);
 return $n*fact($n - 1);
}

for my $n (0..170) {$f{$n} = fact($n)};

sub facttable {
 my $n = shift;
 return $f{$n} if ($n <= 170);
 return Inf;
}

sub stirling {
 my $n = shift;
 return 1 if $n == 0;
 return sqrt(2*pi()*$n)*($n/exp(1))**$n*exp(1/(12*$n));
}

print "Enter an integer 0 to 170: ";
$n = <>;
chomp($n);

print "Using recursive loop, $n! is ";
my $start = [ gettimeofday() ];
print fact($n);
$elapsed_secs = tv_interval($start);
print " and the code took: ",$elapsed_secs,"seconds\n";

print "Using table, $n! is ";
my $start = [ gettimeofday() ];
print facttable($n);
$elapsed_secs = tv_interval($start);
print " and the code took: ",$elapsed_secs,"seconds\n";

print "Using Stirling, $n! is ";
my $start = [ gettimeofday() ];
print stirling($n);
$elapsed_secs = tv_interval($start);
print " and the code took: ",$elapsed_secs,"seconds\n";

Output:

cass317m4138:Documents alex.jordan$ perl factorial.perl 
Enter an integer 0 to 170: 0
Using recursive loop, 0! is 1 and the code took: 2.1e-05seconds
Using table, 0! is 1 and the code took: 8e-06seconds
Using Stirling, 0! is 1 and the code took: 4e-06seconds

cass317m4138:Documents alex.jordan$ perl factorial.perl 
Enter an integer 0 to 170: 1
Using recursive loop, 1! is 1 and the code took: 2.2e-05seconds
Using table, 1! is 1 and the code took: 1.2e-05seconds
Using Stirling, 1! is 1.00227444918223 and the code took: 2.3e-05seconds

cass317m4138:Documents alex.jordan$ perl factorial.perl 
Enter an integer 0 to 170: 5
Using recursive loop, 5! is 120 and the code took: 2.5e-05seconds
Using table, 5! is 120 and the code took: 1.2e-05seconds
Using Stirling, 5! is 120.002637086197 and the code took: 2.3e-05seconds

cass317m4138:Documents alex.jordan$ perl factorial.perl 
Enter an integer 0 to 170: 20
Using recursive loop, 20! is 2432902008176640000 and the code took: 3.8e-05seconds
Using table, 20! is 2432902008176640000 and the code took: 1.3e-05seconds
Using Stirling, 20! is 2.43290285233216e+18 and the code took: 2.3e-05seconds

cass317m4138:Documents alex.jordan$ perl factorial.perl 
Enter an integer 0 to 170: 50
Using recursive loop, 50! is 3.04140932017134e+64 and the code took: 7.1e-05seconds
Using table, 50! is 3.04140932017134e+64 and the code took: 1.2e-05seconds
Using Stirling, 50! is 3.0414093877505e+64 and the code took: 2.3e-05seconds

cass317m4138:Documents alex.jordan$ perl factorial.perl 
Enter an integer 0 to 170: 100
Using recursive loop, 100! is 9.33262154439441e+157 and the code took: 7.4e-05seconds
Using table, 100! is 9.33262154439441e+157 and the code took: 9e-06seconds
Using Stirling, 100! is 9.3326215703177e+157 and the code took: 1.8e-05seconds

cass317m4138:Documents alex.jordan$ perl factorial.perl 
Enter an integer 0 to 170: 170
Using recursive loop, 170! is 7.25741561530799e+306 and the code took: 0.000102seconds
Using table, 170! is 7.25741561530799e+306 and the code took: 1e-05seconds
Using Stirling, 170! is 7.25741561941138e+306 and the code took: 1.8e-05seconds

cass317m4138:Documents alex.jordan$ perl factorial.perl 
Enter an integer 0 to 170: 171
Using recursive loop, 171! is Inf and the code took: 0.000119seconds
Using table, 171! is Inf and the code took: 6e-06seconds
Using Stirling, 171! is Inf and the code took: 2e-05seconds

This code isn't robust against non-integer input, but there's reason to hope that a preconstructed hash combined with returning Inf could work out.

dpvc commented 2 years ago

A recursive subroutine is a terrible way to implement factorial, and I wish CS classes would not use this as their canonical example fo recursive functions, as recursion is completely unnecessary for this, and indeed far worse than a straight-forward loop. MathObjects doesn't use a recursive function, it uses a multiplication loop:

sub factorial {
  $n = shift;
  return Inf if $n > 170;
  $f = 1;
  while ($n > 0) {$f *= $n--;}
  return $f;
}

(but without the check for 170). I have added this to your timing script and get the following output:

dpvc@Tumithak:[pg]2.> perl factorial.perl 
Enter an integer 0 to 170: 0
Using recursive loop, 0! is 1 and the code took: 3.5e-05seconds
Using while loop, 0! is 1 and the code took: 1.3e-05seconds
Using table, 0! is 1 and the code took: 2.1e-05seconds
Using Stirling, 0! is 1 and the code took: 1e-05seconds

dpvc@Tumithak:[pg]3.> perl factorial.perl
Enter an integer 0 to 170: 1
Using recursive loop, 1! is 1 and the code took: 6.1e-05seconds
Using while loop, 1! is 1 and the code took: 2.6e-05seconds
Using table, 1! is 1 and the code took: 1.7e-05seconds
Using Stirling, 1! is 1.00227444918223 and the code took: 3.9e-05seconds

dpvc@Tumithak:[pg]4.> perl factorial.perl
Enter an integer 0 to 170: 5
Using recursive loop, 5! is 120 and the code took: 6e-05seconds
Using while loop, 5! is 120 and the code took: 1.6e-05seconds
Using table, 5! is 120 and the code took: 5.7e-05seconds
Using Stirling, 5! is 120.002637086197 and the code took: 3.5e-05seconds

dpvc@Tumithak:[pg]5.> perl factorial.perl
Enter an integer 0 to 170: 20
Using recursive loop, 20! is 2432902008176640000 and the code took: 0.00014seconds
Using while loop, 20! is 2432902008176640000 and the code took: 2.6e-05seconds
Using table, 20! is 2432902008176640000 and the code took: 1.9e-05seconds
Using Stirling, 20! is 2.43290285233216e+18 and the code took: 4.9e-05seconds

dpvc@Tumithak:[pg]6.> perl factorial.perl
Enter an integer 0 to 170: 50
Using recursive loop, 50! is 3.04140932017134e+64 and the code took: 0.000257seconds
Using while loop, 50! is 3.04140932017134e+64 and the code took: 4.1e-05seconds
Using table, 50! is 3.04140932017134e+64 and the code took: 1.8e-05seconds
Using Stirling, 50! is 3.0414093877505e+64 and the code took: 5.1e-05seconds

dpvc@Tumithak:[pg]7.> perl factorial.perl
Enter an integer 0 to 170: 100
Using recursive loop, 100! is 9.33262154439441e+157 and the code took: 0.00028seconds
Using while loop, 100! is 9.33262154439442e+157 and the code took: 6e-05seconds
Using table, 100! is 9.33262154439442e+157 and the code took: 2.1e-05seconds
Using Stirling, 100! is 9.3326215703177e+157 and the code took: 4.4e-05seconds

dpvc@Tumithak:[pg]8.> perl factorial.perl
Enter an integer 0 to 170: 170
Using recursive loop, 170! is 7.25741561530799e+306 and the code took: 0.000393seconds
Using while loop, 170! is 7.257415615308e+306 and the code took: 7.6e-05seconds
Using table, 170! is 7.257415615308e+306 and the code took: 1.8e-05seconds
Using Stirling, 170! is 7.25741561941138e+306 and the code took: 5.2e-05seconds

dpvc@Tumithak:[pg]9.> perl factorial.perl
Enter an integer 0 to 170: 171
Using recursive loop, 171! is Inf and the code took: 0.000668seconds
Using while loop, 171! is Inf and the code took: 1.3e-05seconds
Using table, 171! is Inf and the code took: 9e-06seconds
Using Stirling, 171! is Inf and the code took: 4.1e-05seconds

As you can see, this implementation is roughly comparable to the table and Stirling approaches, and I don't think there is much point in changing the algorithm (other than to terminate for n > 170), as the while-loop is roughly 2 to 4 times the table lookup, and generally comparable to the Stirling formula. The recursive function should not be used.

The reason you were getting the time out for 21!! is, as you have pointed out, this means (21!)!, and the second factorial is of 5.10909421717094e+19, and so will loop that many times. Of course, the factorial function should have a cap, which it doesn't. Adding that should resolve the problem. Having it return an Infinity object in that case should resolve the error message about not being able to convert infinity to a constant (I think).

In terms of making n!! be n(n-2)(n-4)... that would be straight forward, if you wanted that.

pstaabp commented 2 years ago

I would recommend staying with the loop of @dpvc with the cutoff at 171.

I like the double factorial function--I don't think I've written a problem with it. It would be nice addition and if we add it, let's make sure to document it.

taniwallach commented 2 years ago

Thank you all for your input. PR https://github.com/openwebwork/pg/pull/682 makes the small patch to limit the loop to n<=170.

Adding double factorial is something which someone who is more comfortable editing MathObject's internals can try to implement.

taniwallach commented 2 years ago

Some additional comments for the record:

Problem authors should avoid using such as part of their answers, and should probably avoid any potential evaluation of n! for n>170 in problems. It is probably safer to keep n significantly small than 170 in tests.

One additional though about the limitation on factorials: Due to 171! effectively being infinity for WeBWorK, student answers which intentionally or not use such large factorials may misbehave.

taniwallach commented 2 years ago

Fixed for PG-2.17 by https://github.com/openwebwork/pg/pull/682