tdulcet / Distributed-Computing-Scripts

🖧 Distributed Computing Scripts for GIMPS, BOINC and Folding@home
MIT License
18 stars 12 forks source link

Installing gpuowl seems has failed #18

Closed SyauqiMA closed 1 year ago

SyauqiMA commented 1 year ago

I am running the script from my Google Colab, and it seems gpuowl did not successfully installed.

This is the output that I think is the problem

Downloading, building and setting up GpuOwl

PrimeNet User ID:   <my_id>
Computer name:          <my_comp_name>
Type of work:       150
Idle time to run:   10 minutes

System idle time for all processor (CPU) threads since the last boot:   23 minutes 2 seconds

Number of platforms                               0
Error: This computer does not have an OpenCL platform
sed: can't read gpuowl/1/gpuowl: No such file or directory
sed: can't read gpuowl/1/gpuowl: No such file or directory
sed: can't read gpuowl/1/gpuowl: No such file or directory
sed: can't read gpuowl/1/gpuowl: No such file or directory
Registering computer with PrimeNet

/bin/bash: line 0: cd: gpuowl/1: No such file or directory
Downloading and setting up Prime95
and continues...

And my colab instance info is this

Graphics Processor (GPU):   Tesla T4

Linux Distribution:     Ubuntu 20.04.5 LTS
Linux Kernel:           5.10.147+
Computer Model:         Google Google Compute Engine 
Processor (CPU):        Intel(R) Xeon(R) CPU @ 2.20GHz
CPU Sockets/Cores/Threads:  1/1/2
Architecture:           x86_64 (64-bit)
Total memory (RAM):     12,985 MiB (13GiB) (13,616 MB (14GB))
Total swap space:       0 MiB (0 MB)
Disk space:         sda: 83,968 MiB (82GiB) (88,046 MB (89GB))
Computer name:          2710e45cb6ad
Hostname:           2710e45cb6ad
IPv4 address:           eth0: 172.28.0.12
MAC address:            eth0: 02:42:ac:1c:00:0c
Computer ID:            45b5db723f224a57a4852b1acb1b61d5
Time zone:          
Language:           en_US.UTF-8
Virtualization container:   docker
Virtual Machine (VM) hypervisor:kvm
Bash Version:           5.0.17(1)-release
bash: line 171: /dev/tty: No such device or address
bash: line 172: /dev/tty: No such device or address
bash: line 173: /dev/tty: No such device or address
Terminal:           xterm-color

Python 3.9.16

Is this a problem with my assigned GPU?

The Prime95 installation works prerfectly by the way, and is able to get task from PrimeNet.

But when I restarted my instance and run it again, the output seems to have stuck in the Prime95 menu:

Prime95 is already downloaded

Optimizing Prime95 for your computer
This may take a while…

Skipped

Setting up Prime95

spawn ./mprime -m -A1
[Main thread Apr 19 00:52] Mersenne number primality test program version 30.8
[Main thread Apr 19 00:52] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 55 MB
         Main Menu

     1.  Test/Primenet
     2.  Test/Workers
     3.  Test/Status
     4.  Test/Continue
     5.  Test/Exit
     6.  Advanced/Test
     7.  Advanced/Time
     8.  Advanced/P-1
     9.  Advanced/ECM
    10.  Advanced/Manual Communication
    11.  Advanced/Unreserve Exponent
    12.  Advanced/Quit Gimps
    13.  Options/CPU
    14.  Options/Resource Limits
    15.  Options/Preferences
    16.  Options/Torture Test
    17.  Options/Benchmark
    18.  Help/About
    19.  Help/About PrimeNet Server
Your choice: 

And when I try to stop the Colab cell, it outputs like this:

Skipped

Setting it to start if the computer has not been used in the specified idle time and stop it when someone uses the computer

Skipped

Starting PrimeNet

/bin/bash: line 0: cd: gpuowl/1: No such file or directory
nohup: redirecting stderr to stdout
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...
Waiting for 'worktodo.ini' access...

and on and on....

What is the problem here? I really interested in prime hunting and really appreciate this script! Thanks for your time!

SyauqiMA commented 1 year ago

This problem refers to the GoogleColabGPU.ipynb script, sorry for not pointing that out earlier!

tdulcet commented 1 year ago

Thanks for the detailed bug report! I am sorry our GPU notebook did not work for you.

The issue is that OpenCL is currently broken on Colab, which is what GpuOwl uses. See this post and my response a few posts below for more information: https://mersenneforum.org/showthread.php?p=627996#post627996.

From your provided output, note that my GpuOwl install script has a simple OpenCL check, which is currently failing:

Number of platforms                               0
Error: This computer does not have an OpenCL platform

Our GPU notebook was never expecting the install script to fail in this way, so it causes that cascading series of errors that you saw. As a workaround, you could add a simple sed command to the install() function in the notebook to disable this OpenCL check:

!sed -i '/^if command -v clinfo/,/^fi/ s/^/# /' gpuowl2.sh # Do not check for clinfo

This would allow the GpuOwl and Prime95/MPrime install scripts to work as expected, but GpuOwl of course will not run until after Colab fixes their OpenCL issue...

In the meantime, I would recommend using our CPU only notebook and/or the old version of our GPU notebook, which ran CUDALucas. CUDALucas is slower than GpuOwl and it only supports LL tests, but it uses CUDA, which currently still works on Colab. Here is a link to that older GPU notebook: https://github.com/tdulcet/Distributed-Computing-Scripts/blob/933d9916a8bc841c3313d77a010d73e95e9dee65/google-colab/GoogleColabGPU.ipynb.

CC: @Danc2050

tdulcet commented 1 year ago

I just pushed that change to the GPU notebook I suggested above: https://github.com/tdulcet/Distributed-Computing-Scripts/commit/23162e2cb7055a618a909facef2c091a3f0763ba. Let me know if you have any additional questions...

SyauqiMA commented 1 year ago

Thanks for the explanation! the broken OpenCL is indeed unfortunate. I will try running the old script, then!

tdulcet commented 1 year ago

No problem. Yes, we are hoping they will fix OpenCL soon...

Anyway, I just tested the old version of our GPU notebook that I linked to above and it does still work as expected.