Open iKaHibi opened 3 years ago
I believe that you installed the release that is from 4 years ago. Please install from master.
Thank you for your quick reply! I do used an old version, and after I installed from master, I got the problem like this:
Elapsed time for retina simulation: 5.39s
retina0: 100%|██████████| 1/1 [00:00<00:00, 1693.30it/s]
An error occured during execution of LPU retina0 at step 0:
Traceback (most recent call last):
File "/anaconda3/envs/nk/lib/python3.7/site-packages/pycuda/tools.py", line 470, in wrapper
return ctx_dict[cur_ctx][cache_key]
KeyError: <pycuda._driver.Context object at 0x7fe35f234c30>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/anaconda3/envs/nk/lib/python3.7/site-packages/neurokernel/tools/misc.py", line 144, in catch_exception
func(*args, **kwargs)
File "/neurodriver-master/neurokernel/LPU/LPU.py", line 942, in pre_run
self.init_variable_memory()
File "/neurodriver-master/neurokernel/LPU/LPU.py", line 1171, in init_variable_memory
info=d)
File "/neurodriver-master/neurokernel/LPU/MemoryManager.py", line 93, in memory_alloc
CircularArray(size, buffer_length, dtype, init)}
File "/neurodriver-master/neurokernel/LPU/MemoryManager.py", line 244, in __init__
(buffer_length, size), dtype)
File "/neurodriver-master/neurokernel/LPU/utils/parray.py", line 2016, in zeros
result.fill(0)
File "/neurodriver-master/neurokernel/LPU/utils/parray.py", line 1528, in fill
self.dtype, pitch = True)
File "/anaconda3/envs/nk/lib/python3.7/site-packages/pycuda/tools.py", line 474, in wrapper
result = func(*args, **kwargs)
File "/neurodriver-master/neurokernel/LPU/utils/parray_utils.py", line 29, in get_fill_function
}, options=["--ptxas-options=-v"]).get_function(name)
File "/anaconda3/envs/nk/lib/python3.7/site-packages/pycuda/compiler.py", line 358, in __init__
include_dirs,
File "/anaconda3/envs/nk/lib/python3.7/site-packages/pycuda/compiler.py", line 298, in compile
return compile_plain(source, options, keep, nvcc, cache_dir, target)
File "/anaconda3/envs/nk/lib/python3.7/site-packages/pycuda/compiler.py", line 87, in compile_plain
checksum.update(preprocess_source(source, options, nvcc).encode("utf-8"))
File "/anaconda3/envs/nk/lib/python3.7/site-packages/pycuda/compiler.py", line 59, in preprocess_source
"nvcc preprocessing of %s failed" % source_path, cmdline, stderr=stderr
pycuda.driver.CompileError: nvcc preprocessing of /tmp/tmpfzca4emr.cu failed
[command: nvcc --preprocess --ptxas-options=-v -arch sm_75 -I/anaconda3/envs/nk/lib/python3.7/site-packages/pycuda/cuda /tmp/tmpfzca4emr.cu --compiler-options -P]
[stderr:
b"nvcc fatal : Value 'sm_75' is not defined for option 'gpu-architecture'\n"]
Do this mean that I made some mistake when installing cuda or neurodriver?
What's your CUDA version? Try
nvcc --version
My CUDA version is V9.0.176 and I am using OpenMPI-4.1.0.
That version of CUDA does not support sm_75. I wonder where that came from. What GPU card do you have?
I got a nvidia rtx 2060.
Yeah, your CUDA version is too low for this card. Try update to the latest NVIDIA driver and CUDA.
Thank you for your response! After updating my CUDA to 11.3 and reinstalled pyCUDA and scikit-CUDA, I met new error:
Manager spawned
/home/anaconda3/envs/nk/lib/python3.7/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
warnings.warn('creating CUBLAS context to get version number')
retina0: Number of PhotoreceptorModel: 4326
retina0: Number of BufferPhoton: 4326
retina0: Number of Port: 8652
retina0: Number of Input: {'photon': 4326}
retina0: 100%|██████████| 1/1 [00:00<00:00, 15.17it/s]
Elapsed time for retina simulation: 7.22s
An error occured during execution of LPU retina0 at step 0:
Traceback (most recent call last):
File "/home/anaconda3/envs/nk/lib/python3.7/site-packages/neurokernel/tools/misc.py", line 144, in catch_exception
func(*args, **kwargs)
File "/home/Code/flybrain/new_src/neurodriver-master/neurokernel/LPU/LPU.py", line 1299, in pre_run
p._pre_run()
File "/home/Code/flybrain/new_src/neurodriver-master/neurokernel/LPU/InputProcessors/BaseInputProcessor.py", line 247, in _pre_run
self.pre_run()
File "/home/Code/flybrain/src_code/retina-master/retina/InputProcessors/RetinaInputProcessor.py", line 29, in pre_run
self.generate_receptive_fields()
File "/home/Code/flybrain/src_code/retina-master/retina/InputProcessors/RetinaInputProcessor.py", line 101, in generate_receptive_fields
rfs.generate_filters()
File "/home/Code/flybrain/src_code/retina-master/retina/vrf/vrf.py", line 90, in generate_filters
(N_filters, self.size), self.dtype)
File "/home/Code/flybrain/src_code/retina-master/retina/vrf/utils/parray.py", line 270, in __init__
self.shape[0], np.dtype(dtype).itemsize)
pycuda._driver.MemoryError: cuMemAllocPitch failed: out of memory
Does this mean the video memory of my GPU is not enough to run the code?
The full model requires about 4GB of GPU memory. The standard memory configuration for 2060 should have 6GB. But if you really have less memory, try to run a smaller model, to start with, reduce the number of rings to 0 (see also Figure 4a of Neurokernel RFC #3) by setting
[Retina]
rings = 0
in your config file. That will run only 1 ommatidium in the center with 6 photoreceptors.
I changed the config to
[Retina]
rings = 0
and got the following feedback:
/home/anaconda3/envs/nk/lib/python3.7/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
warnings.warn('creating CUBLAS context to get version number')
/home/anaconda3/envs/nk/lib/python3.7/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
warnings.warn('creating CUBLAS context to get version number')
Starting getting configuration
Elapsed time for getting configuration: 0.00s
Starting instantiation of retina
Using input generating function
Elapsed time for instantiation of retina: 0.07s
Starting retina simulation
Manager spawned
/home/anaconda3/envs/nk/lib/python3.7/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
warnings.warn('creating CUBLAS context to get version number')
retina0: Number of PhotoreceptorModel: 6
retina0: Number of BufferPhoton: 6
retina0: Number of Port: 12
retina0: Number of Input: {'photon': 6}
Compilation of executable circuit completed in 0.4685242176055908 seconds
retina0: 100%|██████████| 1/1 [00:00<00:00, 2.02it/s]
Elapsed time for retina simulation: 2.09s
closing natural_xy file
--------------------------------------------------------------------------
mpiexec has exited due to process rank 0 with PID 0 on
node akiohibi-Sys exiting improperly. There are three reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.
This may have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
You can avoid this message by specifying -quiet on the mpiexec command line.
--------------------------------------------------------------------------
And, surprisingly, as I reboot my computer and change config back to
[Retina]
rings = 14
I got a different error saying:
/home/anaconda3/envs/nk/lib/python3.7/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
warnings.warn('creating CUBLAS context to get version number')
/home/anaconda3/envs/nk/lib/python3.7/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
warnings.warn('creating CUBLAS context to get version number')
Starting getting configuration
Elapsed time for getting configuration: 0.00s
Starting instantiation of retina
Using input generating function
Elapsed time for instantiation of retina: 1.65s
Starting retina simulation
Manager spawned
/home/anaconda3/envs/nk/lib/python3.7/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
warnings.warn('creating CUBLAS context to get version number')
retina0: Number of BufferPhoton: 4326
retina0: Number of Port: 8652
retina0: Number of PhotoreceptorModel: 4326
retina0: Number of Input: {'photon': 4326}
Compilation of executable circuit completed in 2.6839797496795654 seconds
retina0: 100%|██████████| 1/1 [00:00<00:00, 5.84it/s]
An error occured during execution of LPU retina0 at step 0:
Traceback (most recent call last):
File "/home/anaconda3/envs/nk/lib/python3.7/site-packages/neurokernel/tools/misc.py", line 144, in catch_exception
func(*args, **kwargs)
File "/home/Code/flybrain/new_src/neurodriver-master/neurokernel/LPU/LPU.py", line 1550, in run_step
for p in self.input_processors: p.run_step()
File "/home/Code/flybrain/new_src/neurodriver-master/neurokernel/LPU/InputProcessors/BaseInputProcessor.py", line 91, in run_step
self.update_input()
File "/home/Code/flybrain/src_code/retina-master/retina/InputProcessors/RetinaInputProcessor.py", line 107, in update_input
inputs = self.rfs.filter_image_use(im).get().reshape((-1))
File "/home/Code/flybrain/src_code/retina-master/retina/vrf/vrf.py", line 184, in filter_image_use
handle = la.cublashandle()
File "/home/Code/flybrain/src_code/retina-master/retina/vrf/utils/linalg.py", line 18, in __init__
self.create()
File "/home/Code/flybrain/src_code/retina-master/retina/vrf/utils/linalg.py", line 22, in create
self.handle = cublas.cublasCreate()
File "/home/anaconda3/envs/nk/lib/python3.7/site-packages/skcuda/cublas.py", line 203, in cublasCreate
cublasCheckStatus(status)
File "/home/anaconda3/envs/nk/lib/python3.7/site-packages/skcuda/cublas.py", line 179, in cublasCheckStatus
raise e
skcuda.cublas.cublasNotInitialized
Elapsed time for retina simulation: 7.92s
closing natural_xy file
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
Is this caused by mistake in scikit-cuda installing?
I am not clear why cublasCreate failed to initialize, as a handle has been created when getting the version number (that's what these warnings are about) and no error was report then.
You might want to check your LD_LIBRARY_PATH to make sure that it's getting the right path to the libcublas.so from the current version of CUDA. You can check the version of cublas by
import skcuda.cublas as cublas
print(cublas._cublas_version)
make sure that this version is consistent with the libcublas.so.x.x.x in your CUDA path.
Sorry for keep adding new issue, maybe this time I will leave this issue open until solving all the problems during running retina_worker_only_demo.py
After I turned to python3 environment to run the example code, the import part turns to cause errors, like
So I add the required path by adding code
to the head of retina_worker_only_demo.py
This did not solve the problem as when the code start to use neurokernel, import path still cause errors.
How can I solve the problem related to import path?
By the way, to overcome the problem
TypeError: add_node() takes 2 positional arguments but 3 were given
I changed the code in retina.py fromto
I hope this will not cause bug in the future.