Closed brightson999 closed 4 years ago
I suspect the 1st issue is is because your UID, on the host computer, is not 1000. The jovyan user is 1000 in the docker container. If your UID is not 1000 you'll have issues writing to directories with different UID, unless you're in the same group.
Try:
$ echo $UID
If your UID is NOT 1000 a quick and dirty fix is to make the hostmachine directory /data/underworld/uwgeodynamics
have open access. (This might not be suitable depending on your hostmachine).
To do that run:
$ chmod a+xrw -R /data/underworld/uwgeodynamics
Re: 2nd question.
Yes just one way. That is via mpirun -np 4 python my_script.py
. ipython clusters is not the same parallisation as running the above. N.B. you can modify the hardward resources used by the docker container with commands explained here https://docs.docker.com/config/containers/resource_constraints/
I follow your suggestion. and get some information as follow:
jovyan@bd2c3b412458:~$ echo $UID 1000 jovyan@bd2c3b412458:~$ ls README.md UWGeodynamics cheatsheet development examples install_guides joss test user_guide workspace jovyan@bd2c3b412458:~$ ls -l total 36 -rw-r--r-- 1 jovyan users 991 Mar 26 00:04 README.md drwxr-xr-x 1 jovyan users 4096 Mar 26 00:37 UWGeodynamics drwxr-xr-x 2 jovyan users 4096 Mar 26 00:04 cheatsheet drwxr-xr-x 4 jovyan users 4096 Mar 26 00:04 development drwxr-xr-x 5 jovyan users 4096 Mar 26 00:04 examples drwxr-xr-x 2 jovyan users 4096 Mar 26 00:04 install_guides drwxr-xr-x 2 jovyan users 4096 Mar 26 00:04 joss drwxr-xr-x 3 jovyan users 4096 Mar 26 00:04 test drwxr-xr-x 3 jovyan users 4096 Mar 26 00:04 user_guide drwxrwxrwx 5 1009 1010 125 May 9 23:30 workspace
it is 1000, but doesn't work.
finally I make the hostmachine directory /data/underworld/uwgeodynamics have open access. Now it's ready to run
Hi @julesghub
thank you for your suggestion before. but I get anothor error when I test Tutorial_11_Coupling_with_Badlands.py .
jovyan@bd2c3b412458:~/UWGeodynamics/tutorials$ mpirun -np 24 python Tutorial_11_Coupling_with_Badlands.py
it was running for a while. and got some result folders both badlands and UW. but some error as below:
[5]PETSC ERROR: ------------------------------------------------------------------------ [5]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal memory access [5]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [5]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [5]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [5]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [5]PETSC ERROR: to get more information on the crash. [5]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [5]PETSC ERROR: Signal received [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [5]PETSC ERROR: Petsc Release Version 3.12.1, Oct, 22, 2019 [5]PETSC ERROR: application called MPI_Abort(MPI_COMM_WORLD, 59) - process 6 Tutorial_11_Coupling_with_Badlands.py on a named bd2c3b412458 by Unknown Mon May 11 08:25:27 2020 [5]PETSC ERROR: Configure options --with-debugging=0 --prefix=/usr/local --COPTFLAGS="-g -O3" --CXXOPTFLAGS="-g -O3" --FOPTFLAGS="-g -O3" --with-zlib=1 --download-hdf5=1 --download-mumps=1 --download-parmetis=1 --download-metis=1 --download-superlu=1 --download-scalapack=1 --download-superlu_dist=1 --useThreads=0 --download-superlu=1 --with-shared-libraries --with-cxx-dialect=C++11 --with-make-np=8 [5]PETSC ERROR: #1 User provided function() line 0 in unknown file application called MPI_Abort(MPI_COMM_WORLD, 59) - process 5
=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 9424 RUNNING AT bd2c3b412458 = EXIT CODE: 59 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
I know the performance of badlands is not wery well using mpi. serial is quit good. so I don't know whether this reason cause this error or something else.
thanks for your help again.
Cheers
@brightson999 I don't know the exact reason for this error based on your output. I'd suspect 24 procs is over decomposing the problem. Try fewer procs, 1-4 should be sufficient for this problem.
you are right. fewer procs is fine. I will close this issue. thank you again.
hi everyone,
I first created the container by command as follow: docker run -d --name UD-cu11 -i -t -p 11.11.11.11:8888:8888 -v /data/underworld/uwgeodynamics:/home/jovyan/workspace underworldcode/uwgeodynamics
Then copy part of the example file from the docker file system into the volume. But jupyter notebooks are read only, jupyter documents can be ran, but it can not write the calculation results to the local disk mounted by docker。 all of files in local folders is read only. Showing: Not enough permissions.
Another small issue: Is there only one way to parallelize: jupyter nbconvert --to python my_script.ipynb and then run the python script as follow: mpirun -np 4 python my_script.py can I set Ipython Clusters in jupyter notebooks to parallelize.
Cheers.