maths / moodle-qtype_stack

Stack question type for Moodle
GNU General Public License v3.0
142 stars 149 forks source link

6 failing unit tests with 2020120600 stack version #720

Closed golenkovm closed 1 year ago

golenkovm commented 3 years ago

Hi guys,

Has anyone seen these failing unit tests before?

Moodle 3.9.7+ (Build: 20210520), 49d363745316c3eb77f003ba086d33ea86fc0347
Php: 7.4.3, pgsql: 11.5 (Debian 11.5-1.pgdg90+1), OS: Linux 5.4.0-72-generic x86_64
PHPUnit 7.5.20 by Sebastian Bergmann and contributors.

.............................................................   61 / 3352 (  1%)
.............................................................  122 / 3352 (  3%)
.............................................................  183 / 3352 (  5%)
.............................................................  244 / 3352 (  7%)
.............................................................  305 / 3352 (  9%)
.............................................................  366 / 3352 ( 10%)
.............................................................  427 / 3352 ( 12%)
.............................................................  488 / 3352 ( 14%)
.............................................................  549 / 3352 ( 16%)
.............................................................  610 / 3352 ( 18%)
.............................................................  671 / 3352 ( 20%)
.............................................................  732 / 3352 ( 21%)
.............................................................  793 / 3352 ( 23%)
.............................................................  854 / 3352 ( 25%)
.............................................................  915 / 3352 ( 27%)
.............................................................  976 / 3352 ( 29%)
............................................................. 1037 / 3352 ( 30%)
............................................................. 1098 / 3352 ( 32%)
............................................................. 1159 / 3352 ( 34%)
............................................................. 1220 / 3352 ( 36%)
............................................................. 1281 / 3352 ( 38%)
............................................................. 1342 / 3352 ( 40%)
............................................................. 1403 / 3352 ( 41%)
............................................................. 1464 / 3352 ( 43%)
............................................................. 1525 / 3352 ( 45%)
............................................................. 1586 / 3352 ( 47%)
............................................................. 1647 / 3352 ( 49%)
............................................................. 1708 / 3352 ( 50%)
............................................................. 1769 / 3352 ( 52%)
............................................................. 1830 / 3352 ( 54%)
............................................................. 1891 / 3352 ( 56%)
............................................................. 1952 / 3352 ( 58%)
............................................................. 2013 / 3352 ( 60%)
............................................................. 2074 / 3352 ( 61%)
.....................F....................................... 2135 / 3352 ( 63%)
....................................F........................ 2196 / 3352 ( 65%)
............................................................. 2257 / 3352 ( 67%)
............................................................. 2318 / 3352 ( 69%)
............................................................. 2379 / 3352 ( 70%)
............................................................. 2440 / 3352 ( 72%)
............................................................. 2501 / 3352 ( 74%)
............................................................. 2562 / 3352 ( 76%)
............................................................. 2623 / 3352 ( 78%)
................S............................................ 2684 / 3352 ( 80%)
............................................................. 2745 / 3352 ( 81%)
............................................................. 2806 / 3352 ( 83%)
......FFFF................................................... 2867 / 3352 ( 85%)
............................................................. 2928 / 3352 ( 87%)
............................................................. 2989 / 3352 ( 89%)
............................................................. 3050 / 3352 ( 90%)
............................................................. 3111 / 3352 ( 92%)
............................................................. 3172 / 3352 ( 94%)
...........................................................SS 3233 / 3352 ( 96%)
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS............... 3294 / 3352 ( 98%)
..........................................................    3352 / 3352 (100%)

Time: 9.04 minutes, Memory: 916.00 MB

There were 6 failures:

1) stack_cas_session2_test::test_scientific_notation
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'1.0E+50'
+'1.0e+50'

/siteroot/question/type/stack/tests/cassession2_test.php:1453
/siteroot/lib/phpunit/classes/advanced_testcase.php:80
/siteroot/mod/googlemeetxdrive/vendor/phpunit/phpunit/src/Framework/TestResult.php:693
/siteroot/mod/googlemeetxdrive/vendor/phpunit/phpunit/src/TextUI/TestRunner.php:652

To re-run:
 vendor/bin/phpunit "stack_cas_session2_test" question/type/stack/tests/cassession2_test.php

2) stack_cas_text_test::test_numerical_display_float_scientific_small
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'Decimal number \({1.0e-6}\).'
+'Decimal number \({9.999999999999999e-7}\).'

/siteroot/question/type/stack/tests/fixtures/test_base.php:128
/siteroot/question/type/stack/tests/castext_test.php:818
/siteroot/lib/phpunit/classes/advanced_testcase.php:80
/siteroot/mod/googlemeetxdrive/vendor/phpunit/phpunit/src/Framework/TestResult.php:693
/siteroot/mod/googlemeetxdrive/vendor/phpunit/phpunit/src/TextUI/TestRunner.php:652

To re-run:
 vendor/bin/phpunit "stack_cas_text_test" question/type/stack/tests/castext_test.php

3) stack_studentinput_testcase::test_studentinput with data set #27 ('1E+3', 'php_true', '1E+3', 'cas_true', '1.0E+3', '', 'Scientific notation')
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'1.0E+3'
+'1.E+3'

/siteroot/question/type/stack/tests/fixtures/test_base.php:183
/siteroot/question/type/stack/tests/studentinput_test.php:43
/siteroot/lib/phpunit/classes/advanced_testcase.php:80
/siteroot/mod/googlemeetxdrive/vendor/phpunit/phpunit/src/Framework/TestResult.php:693
/siteroot/mod/googlemeetxdrive/vendor/phpunit/phpunit/src/TextUI/TestRunner.php:652

To re-run:
 vendor/bin/phpunit "stack_studentinput_testcase" question/type/stack/tests/studentinput_test.php

4) stack_studentinput_testcase::test_studentinput with data set #28 ('3E2', 'php_true', '3E2', 'cas_true', '3.0E+2', '', '')
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'3.0E+2'
+'3.E+2'

/siteroot/question/type/stack/tests/fixtures/test_base.php:183
/siteroot/question/type/stack/tests/studentinput_test.php:43
/siteroot/lib/phpunit/classes/advanced_testcase.php:80
/siteroot/mod/googlemeetxdrive/vendor/phpunit/phpunit/src/Framework/TestResult.php:693
/siteroot/mod/googlemeetxdrive/vendor/phpunit/phpunit/src/TextUI/TestRunner.php:652

To re-run:
 vendor/bin/phpunit "stack_studentinput_testcase" question/type/stack/tests/studentinput_test.php

5) stack_studentinput_testcase::test_studentinput with data set #29 ('3e2', 'php_true', '3e2', 'cas_true', '3.0E+2', '', '')
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'3.0E+2'
+'3.E+2'

/siteroot/question/type/stack/tests/fixtures/test_base.php:183
/siteroot/question/type/stack/tests/studentinput_test.php:43
/siteroot/lib/phpunit/classes/advanced_testcase.php:80
/siteroot/mod/googlemeetxdrive/vendor/phpunit/phpunit/src/Framework/TestResult.php:693
/siteroot/mod/googlemeetxdrive/vendor/phpunit/phpunit/src/TextUI/TestRunner.php:652

To re-run:
 vendor/bin/phpunit "stack_studentinput_testcase" question/type/stack/tests/studentinput_test.php

6) stack_studentinput_testcase::test_studentinput with data set #30 ('3e-2', 'php_true', '3e-2', 'cas_true', '3.0E-2', '', '')
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'3.0E-2'
+'3.E-2'

/siteroot/question/type/stack/tests/fixtures/test_base.php:183
/siteroot/question/type/stack/tests/studentinput_test.php:43
/siteroot/lib/phpunit/classes/advanced_testcase.php:80
/siteroot/mod/googlemeetxdrive/vendor/phpunit/phpunit/src/Framework/TestResult.php:693
/siteroot/mod/googlemeetxdrive/vendor/phpunit/phpunit/src/TextUI/TestRunner.php:652

To re-run:
 vendor/bin/phpunit "stack_studentinput_testcase" question/type/stack/tests/studentinput_test.php

FAILURES!
Tests: 3352, Assertions: 100034, Failures: 6, Skipped: 49.

Some details on my case:

Please, let me know your thought what might cause these 6 tests to fail?

Kind regards, Mikhail

aharjula commented 3 years ago

Seems like differences in the floats in LISP+Maxima combinations. The tests are correct and show the logic not functioning on a particular LISP implementation of floats. So those are true failures and changing the tests is not the way to deal with this the changes need to be done on the logic side generating the output and it needs to be tested with a wide range of LISP+Maxima combinations.

sangwinc commented 3 years ago

Thank you for pushing some code here! We really do appreciate you taking the time and trouble to do this.

As Matti says, this is a lisp difference. It has been on my "to do" list for a while to make sure the test set up has a lisp setting, and we check for the right behaviour based on your setup! This is a helpful pull request, and will prompt me to actually do this. I know having failing tests is a problem and we should not ignore it. The "e" vs "E" is not a blocker to using STACK.

Chris

golenkovm commented 3 years ago

Hi @sangwinc and @aharjula

Thank you for your feedback.

I agree that the "e" vs "E" is not a blocker to using STACK. However, failing unit tests is a blocker. At least for us. Could you please provide any ETA on fixing this from lisp end?

Kind regards, Mikhail

sangwinc commented 3 years ago

Yes, I quite understand why failing unit tests would be a compliance blocker! Sorry, about this. Leave it with me.

timhunt commented 3 years ago

We alreasy have a thing in place that is used in some test to ingore 'irrelevant' differences in floats. I wonder how easy it is to start using it here?

(E.g. https://github.com/maths/moodle-qtype_stack/blob/master/tests/test_base_test.php#L43 - incidentally, those tests don't make sense to me. is it me, or is that testing the same thing repeatedly?)

sangwinc commented 3 years ago

Tim, I think those tests do have some subtle differences!

sangwinc commented 3 years ago

Mikhail, What version of Maxima and LISP are you using please?

Just start Maxima from the command line and let me know the result (email is fine if you don't want to paste it here). Chris

golenkovm commented 3 years ago

Hi @sangwinc

As I can see, it's Lisp SBCL 1.3.14.debian:

root@da58fbc0117c:/usr/local/tomcat# maxima 
WARNING:
Couldn't re-execute SBCL with proper personality flags (/proc isn't mounted? setuid?)
Trying to continue anyway.
Maxima 5.41.0 http://maxima.sourceforge.net
using Lisp SBCL 1.3.14.debian
Distributed under the GNU Public License. See the file COPYING.
Dedicated to the memory of William Schelter.
The function bug_report() provides bug reporting information.
(%i1) 

I'm using Maximapool https://github.com/uni-halle/maximapool-docker with manually updated stack plugin up to 2020120600 https://github.com/catalyst/maximapool-docker/commit/419f37a21231fc29c41455ac2a3c7b24adaab2f5

Did you manage to replicate these failing unit tests?

Cheers, MIkhail

cameron1729 commented 1 year ago

We should be able to close this now that #935 has been merged.