koszullab / instaGRAAL

Large genome reassembly based on Hi-C data, continuation of GRAAL
https://research.pasteur.fr/fr/software/graal-software-for-genome-assembly-from-chromosome-contact-frequencies/
GNU General Public License v3.0
40 stars 9 forks source link

InstaGRAAL docker crash #13

Closed bistace closed 3 years ago

bistace commented 4 years ago

Hi,

I am trying to run instaGRAAL using docker. However, a black window appears for 1 second before the program crashes without an error message. Here is the command that I ran:

sudo docker run  --net=host -e DISPLAY=$DISPLAY -v="/home/bistace/.Xauthority:/root/.Xauthority:rw" -v /datastore:/datastore koszullab/instagraal --debug /datastore/hicstuff /datastore/genome.fasta /datastore/instagraal"

I tried going into the container while running a /bin/bash interpreter to launch the program by myself inside the container but only get a segmentation fault at instaGRAAL startup.

The docker container is running on an Ubuntu 18.04 OS, with 64GB of RAM and a Geforce GTX 1070 with the recommended driver (nvidia-driver-440).

Could you please help me troubleshoot this issue?

bistace commented 4 years ago

I also tried to run instaGRAAL, without the docker container. The -h works fine but launching it with the same option as above gives me the following error:

Traceback (most recent call last):
  File "/home/bistace/.local/bin/instagraal", line 11, in <module>
    sys.exit(main())
  File "/home/bistace/.local/lib/python3.6/site-packages/instagraal/instagraal.py", line 2161, in main
    output_folder=output_folder,
  File "/home/bistace/.local/lib/python3.6/site-packages/instagraal/instagraal.py", line 208, in __init__
    self.cuda_gl_init()
  File "/home/bistace/.local/lib/python3.6/site-packages/instagraal/instagraal.py", line 1442, in cuda_gl_init
    cuda.init()
pycuda._driver.Error: cuInit failed: unknown error

I don't know if the issues are related but I prefer giving you all the information that I have.

EDIT: I fixed this issue by following this StackOverflow answer https://stackoverflow.com/a/45319156.

Now I am getting the following error:

INFO :: Selected_device: GeForce GTX 1070
Traceback (most recent call last):
  File "/home/bistace/.local/bin/instagraal", line 11, in <module>
    sys.exit(main())
  File "/home/bistace/.local/lib/python3.6/site-packages/instagraal/instagraal.py", line 2161, in main
    output_folder=output_folder,
  File "/home/bistace/.local/lib/python3.6/site-packages/instagraal/instagraal.py", line 208, in __init__
    self.cuda_gl_init()
  File "/home/bistace/.local/lib/python3.6/site-packages/instagraal/instagraal.py", line 1448, in cuda_gl_init
    curr_gpu, flags=cudagl.graphics_map_flags.NONE
pycuda._driver.Error: cuGLCtxCreate failed: unknown error

This error is, I believe, the same as in #12.

cmdoret commented 4 years ago

Hi @bistace, Thanks for the detailed infos ! I can't help you with the docker issue - I encountered the same issue myself - however could you show me the output of lspci -k | grep -i nvidia to make sure the driver is loaded correctly ?

bistace commented 4 years ago

Here is the output of the command:

01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
01:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
cmdoret commented 4 years ago

OK, this error can apparently happen when there is no openGL driver facility available, did you build pycuda with openGL support as specified in the readme ?

git clone --recurse-submodules https://github.com/inducer/pycuda.git
cd pycuda
python3 configure.py --cuda-enable-gl --no-use-shipped-boost
sudo python3 setup.py install

If you did, since this is an openGL problem, perhaps you could try the no_opengl branch of instagraal instead, and see if that works for you ? https://github.com/koszullab/instaGRAAL/tree/no_opengl

bistace commented 4 years ago

Indeed, I did build pycuda with the --cuda-enable-gl option. I will try the no_opengl branch and keep you informed.

cmdoret commented 4 years ago

Also, does nvidia-smi work properly and what version of CUDA does it show ?

bistace commented 4 years ago

nvidia-smiworks and here is its output:

Wed May 20 14:25:19 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   43C    P8    13W / 180W |      0MiB /  8118MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Is there something specific I need to do to build instaGRAAL with the no_opengl branch or is it the same as the master branch?

nadegeguiglielmoni commented 4 years ago

I do not see the CUDA version in your nvidia-smi output, were you able to install CUDA successfully?

cmdoret commented 4 years ago

It seems CUDA is not correctly installed on your system, this will also cause errors with the other branch. (the CUDA version should show up on the top-right of nvidia-smi's output)

The no_openGL branch disables the GUI, this solves some display-related issues, but in your case, you need to make sure you have a workig CUDA installation, there are instructions in the readme to install it https://github.com/koszullab/instaGRAAL/tree/master#external-libraries

bistace commented 4 years ago

This is strange, as nvcc and other commands are available. Did you already encounter a case like this? Otherwise, I will have to search the cause by myself and report to you.

bistace commented 4 years ago

I have found there: https://forums.developer.nvidia.com/t/nvidia-smi-doesnt-show-cuda-version-even-after-installation/68738 that the CUDA version was not displayed until drivers 410.72 and I have the 410.48. Would you recommend that I upgrade drivers?

cmdoret commented 4 years ago

yeah that would probably be good, also make sure nvcc --version works and gives the correct version.

nadegeguiglielmoni commented 4 years ago

I am a bit confused, because you said at the beginning that you installed nvidia-driver-440 but you have 410... I remember that I recently had a lot of troubles with NVIDIA drivers 410, maybe it can be the reason.

bistace commented 4 years ago

I mistyped the driver's version number in my opening post, my bad!

I upgraded nvidia drivers to 440.82 and here is nvdia-smi new output:

Wed May 20 14:45:03 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0 Off |                  N/A |
| 23%   34C    P5    22W / 180W |      0MiB /  8118MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

nvcc --version shows:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

which is the version I installed.

nadegeguiglielmoni commented 4 years ago

And is instaGRAAL still throwing the same error?

bistace commented 4 years ago

Yes, the same error that I got in my second post's edit:

INFO :: Selected_device: GeForce GTX 1070
Traceback (most recent call last):
  File "/home/bistace/.local/bin/instagraal", line 11, in <module>
    sys.exit(main())
  File "/home/bistace/.local/lib/python3.6/site-packages/instagraal/instagraal.py", line 2161, in main
    output_folder=output_folder,
  File "/home/bistace/.local/lib/python3.6/site-packages/instagraal/instagraal.py", line 208, in __init__
    self.cuda_gl_init()
  File "/home/bistace/.local/lib/python3.6/site-packages/instagraal/instagraal.py", line 1448, in cuda_gl_init
    curr_gpu, flags=cudagl.graphics_map_flags.NONE
pycuda._driver.Error: cuGLCtxCreate failed: unknown error
nadegeguiglielmoni commented 4 years ago

We are trying to investigate this problem for now, there is an open issue on this subject on the github of pycuda https://github.com/inducer/pycuda/issues/212 , you can participate.

One thing that you could try perhaps would be to test an earlier version of pycuda https://github.com/inducer/pycuda/tree/9c024f3875aa144463d0cc12edb7193a6e830336

bistace commented 4 years ago

I have installed the no_opengl branch and it seems that instaGRAAL is now running. Here is its output so far, if this seems ok to you, I will report when the execution ends.

WARNING :: /home/bistace/.local/lib/python3.6/site-packages/instagraal-0.1.6-py3.6.egg/instagraal/pyramid_sparse.py:242: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
  pyramid_handle = h5py.File(hdf5_pyramid_file)

INFO :: Start filling the pyramid
INFO :: pyramid built.
INFO :: start filtering
WARNING :: /home/bistace/.local/lib/python3.6/site-packages/instagraal-0.1.6-py3.6.egg/instagraal/pyramid_sparse.py:103: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
  pyramid_0 = h5py.File(init_pyramid_file)

INFO :: nfrags = 714466
INFO :: n init frags = 714466
INFO :: mean sparsity = 3.1304625736083835e-05
INFO :: median sparsity = 2.7992935429210775e-05
INFO :: std sparsity = 2.7630851036519744e-05
INFO :: max_sparsity = 0.0034445305354893208
INFO :: thresh sparsity = 3.673774699564092e-06
INFO :: cleaning : start
INFO :: number of fragments to remove = 203936
INFO :: Sc0000073_polished_polished has been deleted...
INFO :: Sc0000078_polished_polished has been deleted...
INFO :: Sc0000083_polished_polished has been deleted...
INFO :: Sc0000087_polished_polished has been deleted...
INFO :: Sc0000088_polished_polished has been deleted...
INFO :: Sc0000090_polished_polished has been deleted...
INFO :: Sc0000091_polished_polished has been deleted...
INFO :: Sc0000093_polished_polished has been deleted...
INFO :: Sc0000105_polished_polished has been deleted...
INFO :: Sc0000106_polished_polished has been deleted...
INFO :: Sc0000107_polished_polished has been deleted...
INFO :: xfSc0000017_polished_polished has been deleted...
INFO :: xfSc0000019_polished_polished has been deleted...
INFO :: xfSc0000025_polished_polished has been deleted...
INFO :: xfSc0000037_polished_polished has been deleted...
INFO :: xfSc0000048_polished_polished has been deleted...
INFO :: xpSc0000208_polished_polished has been deleted...
INFO :: xpSc0000211_polished_polished has been deleted...
INFO :: xpSc0000224_polished_polished has been deleted...
INFO :: update contacts files...
WARNING :: /home/bistace/.local/lib/python3.6/site-packages/instagraal-0.1.6-py3.6.egg/instagraal/pyramid_sparse.py:122: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
  pyramid_handle = h5py.File(hdf5_pyramid_file)

INFO :: level already built...
INFO :: Start filling the pyramid
INFO :: writing new_files..
INFO :: subsampling : start
INFO :: size matrix before sub sampling = 510530
INFO :: size matrix after sub sampling = 170241
INFO :: sum length contigs = 510530
INFO :: nfrags = 170241
INFO :: new fragments list written...
INFO :: update sparse contacts file...
INFO :: subsampling: done.
INFO :: Start filling the pyramid
INFO :: writing new_files..
INFO :: subsampling : start
INFO :: size matrix before sub sampling = 170241
INFO :: size matrix after sub sampling = 56820
INFO :: sum length contigs = 170241
INFO :: nfrags = 56820
INFO :: new fragments list written...
INFO :: update sparse contacts file...
INFO :: subsampling: done.
INFO :: Start filling the pyramid
nadegeguiglielmoni commented 4 years ago

As soon as you have cycle=0, you're good.

bistace commented 4 years ago

It seems everything is running great, there are some warnings but it doesn't crash:

INFO :: pyramid loaded
INFO :: loading data from level = 4
INFO :: import reference genome
INFO :: loading data from level = 3
INFO :: mean frag area = 719.2686767578125
INFO :: N frag duplicated = 0
INFO :: MAX ID CONTIG = 169
INFO :: total mem used by sparse data = 81.171024
INFO :: loading kernels ...
INFO :: size array in shared memory = 1536
INFO :: kernels compiled
INFO :: setup jumping distribution: start
INFO :: Shape sub mat = (6450, 6450)
INFO :: setup jumping distribution: done
INFO :: recompiling for non-existent cache dir (/home/bistace/.cache/codepy/codepy-compiler-cache-v5-py3.6.9.final.0/dae89e8ccda59996631e33bac0bcb5e2).
x86_64-linux-gnu-g++-8 -pthread -Wno-unused-result -Wsign-compare -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -c -DNDEBUG -I/usr/include/python3.6m -I/usr/local/cuda-10.0/include /home/bistace/.cache/codepy/codepy-compiler-cache-v5-py3.6.9.final.0/dae89e8ccda59996631e33bac0bcb5e2/module.cpp -o /home/bistace/.cache/codepy/codepy-compiler-cache-v5-py3.6.9.final.0/dae89e8ccda59996631e33bac0bcb5e2/module.o
INFO :: recompiling for non-existent cache dir (/home/bistace/.cache/codepy/codepy-compiler-cache-v5-py3.6.9.final.0/513f2f30d213ffd46935cebb6b8709f0).
nvcc -Xcompiler -pthread,-Wno-unused-result,-Wsign-compare,-g,-fwrapv,-O2,-Wall,-g,-fstack-protector-strong,-Wformat,-Werror=format-security,-fPIC -c -DNDEBUG -U__BLOCKS__ -I/usr/include/python3.6m -I/usr/local/cuda-10.0/include /home/bistace/.cache/codepy/codepy-compiler-cache-v5-py3.6.9.final.0/513f2f30d213ffd46935cebb6b8709f0/gpu.cu -o /home/bistace/.cache/codepy/codepy-compiler-cache-v5-py3.6.9.final.0/513f2f30d213ffd46935cebb6b8709f0/gpu.o

x86_64-linux-gnu-g++-8 -pthread -Wno-unused-result -Wsign-compare -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Xlinker -export-dynamic -Wl,-O1 -Wl,-Bsymbolic-functions -DNDEBUG -I/usr/include/python3.6m -I/usr/local/cuda-10.0/include /home/bistace/.cache/codepy/codepy-compiler-cache-v5-py3.6.9.final.0/dae89e8ccda59996631e33bac0bcb5e2/module.o /home/bistace/.cache/codepy/codepy-compiler-cache-v5-py3.6.9.final.0/513f2f30d213ffd46935cebb6b8709f0/gpu.o -L/usr/lib -L/usr/local/cuda-10.0/lib -L/usr/local/cuda-10.0/lib64 -lcuda -lcudart -lboost_python-py36 -lpython3.6m -lpthread -ldl -lutil -o /home/bistace/.cache/codepy/codepy-compiler-cache-v5-py3.6.9.final.0/dae89e8ccda59996631e33bac0bcb5e2/codepy.temp.dae89e8ccda59996631e33bac0bcb5e2.513f2f30d213ffd46935cebb6b8709f0.module.so
INFO :: max dist kb = 20297.691
INFO :: mean size kb = 14.87440885690271
INFO :: min fragment length =  0.199
INFO :: estimation of the parameters of the model
WARNING :: /home/bistace/.local/lib/python3.6/site-packages/instagraal-0.1.6-py3.6.egg/instagraal/optim_rippe_curve_update.py:47: RuntimeWarning: invalid value encountered in log
  + (d - 2) / ((np.power((lm * x / kuhn), 2) + d))

(array([ 8.72323093e+02,  4.03182462e-04, -1.12862724e+00,  1.94821148e+04]), 1)
INFO :: p from estimate parameters  = [872.3230927003598, 0.0004031824620164196, -1.1286272378317719, 2, 19482.11484704844]
INFO :: mean value trans = 0.014333512055576312
INFO :: BEWARE!!! : I will lower mean value trans  !!!
INFO :: estimate max dist cis trans = 39317.84886664265
INFO :: cycle = 0
0.015503875968992248% proceeded
WARNING :: /home/bistace/.local/lib/python3.6/site-packages/instagraal-0.1.6-py3.6.egg/instagraal/cuda_lib_gl_single.py:1868: RuntimeWarning: invalid value encountered in less
  filtered_score[filtered_score < 0] = 0

0.031007751937984496% proceeded
WARNING :: /home/bistace/.local/lib/python3.6/site-packages/instagraal-0.1.6-py3.6.egg/instagraal/cuda_lib_gl_single.py:1867: RuntimeWarning: invalid value encountered in subtract
  filtered_score = scores_ok - (max_score - thresh_overflow)

0.046511627906976744% proceeded
0.06201550387596899% proceeded
0.07751937984496124% proceeded
0.09302325581395349% proceeded
0.10852713178294573% proceeded
0.12403100775193798% proceeded

To make it work, I had to copy the kernels of the master branch inside the kernels folder of the new install. I also had to install gcc-8 but after that, everything went great.

Thanks to both of you for your precious help, I will report back to you when instaGRAAL has finished running.

bistace commented 4 years ago

Hi,

instaGRAAL finished running without encountering any error. However, it produced surprising results. Indeed, base contigs have an N50 of 3.8Mb for a cumulative size of 373Mb, while instaGRAAL gave scaffolds with an N50 of 204kb or 278Mb and a cumulative size of 366Mb (I tried two sets of parameters, see below for the commands I ran).

I ran the following hicstuff command in both cases:

hicstuff pipeline --aligner bowtie2 --enzyme DpnII --iterative --outdir hicstuff --threads 48 --genome /home/bistace/Z1/Z1_all_reads_final_120318.fa R1.fastq R2.fastq

Then, I used these results to run two instaGRAAL commands. First one (which resulted in an N50 of 204kb):

instagraal /datastore/Z1/hicstuff/ /datastore/Z1/Z1_all_reads_final_120318.fa instagraal_c1 -c 1 -l 4 -n 100

Second one (which resulted in an N50 of 218Mb):

instagraal /datastore/Z1/hicstuff/ /datastore/Z1/Z1_all_reads_final_120318.fa instagraal -c 0 -l 4 -n 100

Contigs are from a plant genome and were polished three times with Nanopore reads and three times with Illumina reads. I have also uploaded the log files of hicstuff and of both instaGRAAL runs. Would you have suggestions to help me improve the results I obtained?

hicstuff.log instagraal_c0.log instagraal_c1.log

nadegeguiglielmoni commented 4 years ago

Hello,

First thing to look at is the quality of Hi-C reads mapping. Here I see that only 54% of the reads are mapping, which is not so high, especially with 151 bp reads. I generally get higher mapping rates, but maybe it would be normal for a plant. You could try to check the quality after polishing with KAT (kat comp, more precisely) for example, and see the k-mer completeness. You can also try mapping with bowtie2, to see if the mapping is similar.