trampgeek / jobe

jobe is a server that runs small programming jobs in a variety of programming languages
MIT License
111 stars 80 forks source link

* Time limit exceeded * when running python3 scripts with numpy #66

Closed spetzreborn closed 1 year ago

spetzreborn commented 1 year ago

We have a debian installation of jobe, that gets time limit exceeded when importing numpy.

lsb_release -a

No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 12 (bookworm) Release: 12 Codename: bookworm

free -m

           total        used        free      shared  buff/cache   available

Mem: 8902 583 8239 7 342 8319 Swap: 1955 0 1955

We have 16 processors on this VM.

We have tired to raise the limits of memory to 2000 for python, and CPU runtime with no other result that the tests take longer to respond.

According to debugging in README.md:

" If something unexpected happened with the actual run of a program, find the run in /home/jobe/runs and try executing the program manually. [The run directory contains the source file, the bash command used to run it, plus the compile output and (if it ran) the stderr and stdout outputs. "

We have tried to catch the program and run them in CLI on the server: while :;do time python3 jobe_/__test ;echo $?; sleep 1;done

Output from out program: Test [420/8991] [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]

real 0m0.166s
user 0m0.931s
sys 0m0.883s
0

Where the program is: " import numpy as np def task12(param1, param2): print("Test") return [1, 3, 5, 7, 9, 2, 4, 6, 8, 10] "

testsubmit.py is only complaining on c, witch does not seem relevant to our case.

***** FAILED TEST ** (8 results) [3360/8998]

{'run_id': None, 'outcome': 11, 'cmpinfo': 'prog.c: In function ‘silly’:\nprog.c:4:6: error: infinite rec ursion detected [-Werror=infinite-recursion]\n 4 | void silly(int i) {\n | ^~~~~\nprog.c:7:9 : note: recursive call\n 7 | silly(j);\n | ^~~~\nprog.c:9:9: note: recursive call\n 9 | silly(j + 1);\n | ^~~~\ncc1: all warnings being treated as err ors\n', 'stdout': '', 'stderr': ''}
Infinite recursion (stack error) on C
Jobe result: Compile error

Compiler output:
prog.c: In function ‘silly’:
prog.c:4:6: error: infinite recursion detected [-Werror=infinite-recursion]
4 | void silly(int i) {
| ^~~~~
prog.c:7:9: note: recursive call
7 | silly(j);
| ^~~~ prog.c:9:9: note: recursive call 9 | silly(j + 1); | ^~~~ cc1: all warnings being treated as errors


We don't understand why it completes quickly in CLI, but times out in moodle. Any suggestions or tips?

Regards Björn and Sebastian DSV Stockholm university

trampgeek commented 1 year ago

Hi Björn and Sebastian

I'm assuming you're getting these failures when using Jobe from CodeRunner, right?

This does sound to me like a memory error. numpy is greedy and I recently raised the default memory allocation for Python to 1000 because of numpy. When the memory limit is reached, the call to malloc returns null and much/most software never checks this. So the code can easily get stuck in an endless loop trying to allocate enough memory.

So ... I'm wondering how you set the memory limit. Did you perhaps change the default setting within Jobe itself? If that's the case, it's possible that that default is being overridden somewhere else along the line, e.g. by the question itself or its prototype

Try customising a CodeRunner question that's failing (click "Customise" near the top of the question-authoring form), opening the Advanced customisation section, and entering 1000 into the Memory Limit field.

If that doesn't help, post back and I'll think further on it.

Richard


From: Björn Fjällström @.> Sent: Saturday, 30 September 2023 3:35 am To: trampgeek/jobe @.> Cc: Subscribed @.**> Subject: [trampgeek/jobe] Time limit exceeded * when running python3 scripts with numpy (Issue #66)

We have a debian installation of jobe, that gets time limit exceeded when importing numpy.

lsb_release -a

No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 12 (bookworm) Release: 12 Codename: bookworm

free -m

       total        used        free      shared  buff/cache   available

Mem: 8902 583 8239 7 342 8319 Swap: 1955 0 1955

We have 16 processors on this VM.

We have tired to raise the limits of memory to 2000 for python, and CPU runtime with no other result that the tests take longer to respond.

According to debugging in README.md:

" If something unexpected happened with the actual run of a program, find the run in /home/jobe/runs and try executing the program manually. [The run directory contains the source file, the bash command used to run it, plus the compile output and (if it ran) the stderr and stdout outputs. "

We have tried to catch the program and run them in CLI on the server: while :;do time python3 jobe_/__test ;echo $?; sleep 1;done

Output from out program: Test [420/8991] [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]

real 0m0.166s user 0m0.931s sys 0m0.883s 0

Where the program is: " import numpy as np def task12(param1, param2): print("Test") return [1, 3, 5, 7, 9, 2, 4, 6, 8, 10] "

testsubmit.py is only complaining on c, witch does not seem relevant to our case.

***** FAILED TEST ** (8 results) [3360/8998]

{'run_id': None, 'outcome': 11, 'cmpinfo': 'prog.c: In function ‘silly’:\nprog.c:4:6: error: infinite rec ursion detected [-Werror=infinite-recursion]\n 4 | void silly(int i) {\n | ^~~~~\nprog.c:7:9 : note: recursive call\n 7 | silly(j);\n | ^~~~\nprog.c:9:9: note: recursive call\n 9 | silly(j + 1);\n | ^~~~\ncc1: all warnings being treated as err ors\n', 'stdout': '', 'stderr': ''} Infinite recursion (stack error) on C Jobe result: Compile error

Compiler output: prog.c: In function ‘silly’: prog.c:4:6: error: infinite recursion detected [-Werror=infinite-recursion] 4 | void silly(int i) { | ^~~~~ prog.c:7:9: note: recursive call 7 | silly(j); | ^~~~ prog.c:9:9: note: recursive call 9 | silly(j + 1); | ^~~~ cc1: all warnings being treated as errors


We don't understand why it completes quickly in CLI, but times out in moodle. Any suggestions or tips?

Regards Björn and Sebastian DSV Stockholm university

— Reply to this email directly, view it on GitHubhttps://github.com/trampgeek/jobe/issues/66, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAIM2VTGYA3M5IH5B25IA2TX43MCDANCNFSM6AAAAAA5MSFWQI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

This email may be confidential and subject to legal privilege, it may not reflect the views of the University of Canterbury, and it is not guaranteed to be virus free. If you are not an intended recipient, please notify the sender immediately and erase all copies of the message and any attachments.

spetzreborn commented 1 year ago

Thank you Richard!

We used to set the memory limit to 2000, in both the CodeRunner GUI and python3_task.php ($this->default_params['memorylimit'] = 2000;)

But now we are testing with even higher values, up to 3000 and that seems to make it work. Dont know if numpy actually uses all that memory, but your explanation of failing malloc seems possible.

Thank you and have nice week. We can close this issue.

Regards Björn and Sebastian

trampgeek commented 1 year ago

Thanks for reporting back, Björn and Sebastian. I'm a bit puzzled by the fact you need such a large memory allocation. We teach numpy and matplotlib to around 1300 students with a memory limit of 1000 and have never seen this problem. I've seen a few other users report needing huge resource limits for numpy in the past, too, but have never understood why.

One factor might be that you have 16 cores. We always run with 8. I know that numpy allocates workers based on the the available CPUs and each worker requires memory so that would raise your memory requirements. But a factor of 3 increase seems unlikely just from this.

By the way - you shouldn't need to raise the memory limit in the Jobe code - setting the higher value in the CodeRunner question should suffice (though you wouldn't want to do that for every question, but in the prototype).

If you ever discover why your memory needs are so much higher than usual, please do let me know. In the meantime, I'll close this.