Closed aryamazaheri closed 5 years ago
I also manually compiled the generated CUCL code to ptx using nvcc --ptx
command and it is surprising to me that the ptx file generated by boda is different from the one generated by nvcc.
hmm, i'm not sure offhand. certainly, there can be issues with making sure you're using valid compilation/arch setting for whatever card you are using, and some of that might be hard-coded in a way that could break on an upgrade. but, more to the point, when errors like this happen, i think you'll generally need to investigate the actual CUDA and/or PTX code and see what's up. also, it'd be interesting to know if all cases fail (i.e the simplest unit tests that exercise be=nvrtc, like test_rtc_nvrtc), or only complex ones, or what.
to say more i guess i'll need a test case i can replicate and/or at least i'll need to see more details for a failing example.
FWIW, i'm using cuda 9.1, and at least basis RTC stuff still seems to work fine on boda HEAD, although i'm not currently using/testing it day-to-day. i might be able to install/test cuda 9.2 soon. but, it might be more a hardware-related issue, so i'd be curious about the compute capabilities/etc of the card you're using and/or if that changes recently.
after than, inspecting the compile options and such in nvrtc_util.cc might be a good place to look -- maybe the hard-codes just aren't good for cuda 9.2 + your-card-arch.
any, as usual, testing with a clean slate is good: clear your compute/NV cache, make sure you're not mixing parts of different CUDA versions, etc ...
moskewcz@mazda5:~/git-work/boda/run/tr1$ time boda test_cmds --verbose=999 --filt='.*rtc.*' ; date
(test_name=test_rtc_nvrtc,needs=nvrtc,command=(mode=rtc_test,boda_output_dir=%(test_name),prog_fn=%(boda_test_dir)/nvrtc_test_dot.cu),cli_str=boda rtc_test --rtc='(be=nvrtc)' --prog-fn='%(boda_test_dir)/nvrtc_test_dot.cu)
(test_name=test_rtc_ocl,command=(mode=rtc_test,boda_output_dir=%(test_name),prog_fn=%(boda_test_dir)/ocl_test_dot.cl,rtc=(be=ocl)),cli_str=boda rtc_test --rtc='(be=ocl)' --prog-fn='%(boda_test_dir)/ocl_test_dot.cl')
(test_name=test_rtc_cucl_nvrtc,needs=nvrtc,command=(mode=rtc_test,boda_output_dir=%(test_name)),cli_str=boda rtc_test --rtc='(be=nvrtc)' )
(test_name=test_rtc_cucl_ocl,command=(mode=rtc_test,boda_output_dir=%(test_name),rtc=(be=ocl)),cli_str=boda rtc_test --rtc='(be=ocl)' )
(test_name=test_rtc_cucl_ocl_struct,command=(mode=rtc_test,boda_output_dir=%(test_name),rtc=(be=ocl),func_name=my_dot_struct),cli_str=boda rtc_test --rtc='(be=ocl)' --func-name=my_dot_struct )
(test_name=test_rtc_cucl_ipc,command=(mode=rtc_test,boda_output_dir=%(test_name),rtc=(be=ipc)),cli_str=boda rtc_test --rtc='(be=ipc)' )
(test_name=test_rtc_cucl_ipc_tcp,command=(mode=rtc_test,boda_output_dir=%(test_name),rtc=(be=ipc,boda_parent_addr=tcp:127.0.0.1:12791)),cli_str=boda rtc_test --rtc='(be=ipc,boda_parent_addr=tcp:127.0.0.1:12791)' )
(test_name=test_dense_boda_rtc_1,command=(mode=test_dense,boda_output_dir=%(test_name),imgs=(mode=test_dense,boda_output_dir=%(test_name)),run_cnet=(mode=test_dense,boda_output_dir=%(test_name),in_dims=(img=1)),run_cnet_dense=(mode=test_dense,boda_output_dir=%(test_name),in_dims=(img=1)),wins_per_image=10000,mrd_toler=5e-05),cli_str=boda test_dense --model-name=nin_imagenet_nopad --wins_per_image=10000 --in_dims='(img=1)' --conv_fwd='(mode=rtc)' --run_cnet='()' --run_cnet_dense='()')
(test_name=test_dense_boda_rtc_2,command=(mode=test_dense,boda_output_dir=%(test_name),imgs=(mode=test_dense,boda_output_dir=%(test_name)),run_cnet=(mode=test_dense,boda_output_dir=%(test_name),in_dims=(img=1,x=227,y=227),out_node_name=cccp8),run_cnet_dense=(mode=test_dense,boda_output_dir=%(test_name),in_dims=(img=1,x=227,y=227),out_node_name=cccp8),wins_per_image=10000,mrd_toler=5e-05),cli_str=boda test_dense --model-name=nin_imagenet --wins_per_image=10000 --in_dims='(img=1,y=227,x=227)' --out_node_name=cccp8 --conv_fwd='(mode=rtc)' --run_cnet='()' --run_cnet_dense='()')
(test_name=test_upsamp_1_nvrtc,command=(mode=test_upsamp,boda_output_dir=%(test_name),imgs=(mode=test_upsamp,boda_output_dir=%(test_name)),run_cnet=(mode=test_upsamp,boda_output_dir=%(test_name),in_dims=(img=1,x=516,y=516),out_node_name=cccp8,enable_upsamp_net=1,conv_fwd_upsamp=(mode=rtc,op_tune=(tconv=1))),wins_per_image=3,mrd_toler=0.0002),cli_str=boda test_upsamp --model-name nin_imagenet_nopad --wins-per-image=3 --run-cnet='(in_dims=(img=1,y=516,x=516),enable_upsamp_net=1,out_node_name=cccp8,conv_fwd=(mode=rtc),conv_fwd_upsamp=(mode=rtc,op_tune=(tconv=1)))')
WARNING: skipped some tests due to missing features: num_skipped=8 missing_needed_features=octave
TIMERS: CNT TOT_DUR AVG_DUR TAG
10 24.499s 2.449s test_cmds_cmd
8 8.588s 1.073s nvrtc_compile
18864 253.126ms 0.013ms cu_launch_and_sync
10 54.664ms 5.466ms diff_command
3 71.288ms 23.762ms ocl_compile
3 28.756ms 9.585ms read_pascal_image_list_file
163 43.270ms 0.265ms caffe_copy_layer_blob_data
668 58.473ms 0.087ms img_copy_to
668 1.055s 1.579ms subtract_mean_and_copy_img_to_batch
20 2.742s 137.111ms dense_cnn
668 172.312ms 0.257ms conv_pipe_fwd_t::set_vars
668 13.885s 20.787ms conv_pipe_fwd_t::run_fwd
668 115.286ms 0.172ms conv_pipe_fwd_t::get_vars
588 7.039s 11.971ms sparse_cnn
30 2.837s 94.570ms net_upsamp_cnn
30 195.558ms 6.518ms upsample_2x
30 1.661s 55.391ms img_upsamp_cnn
real 0m24.815s
user 0m39.660s
sys 0m3.856s
Thu Jul 5 10:21:35 PDT 2018
moskewcz@mazda5:~/git-work/boda/run/tr1$
in particular, you might play with the setting(s) for cc_opts_arch and/or the other options set where cc_opts_arch is used. it's been a bit brittle in the past to find a good default for this, and i'm semi-convinced that nvrtc doesn't really handle all the arch options right -- or at least it didn't! maybe these need to be cuda-version dependent now if nvidia 'fixed' something in the nvrtc options handling ...
string cc_opts_arch; //NESI(default="--gpu-architecture=compute_60",help="this entire string will be passed (unchanged) to the nvrtc compiler phase as an option")
oh, and as per the comment, although this is an 'option', there's currently no good way to globally set/configure it ... if need be some system for that (env vars? boda config file entry? etc?) could be introduced.
You are right. I had to change the arch parameter to get it working. I also realized that maaya's configuration/GPUs have been changed.
It would be nice to be able to select the arch parameter (cc_opts_arch
) based on the given GPU, in the future.
yeah, either a better way to set the arch and/or an 'auto' setting would be good. i guess it could be as simple as using some cuda driver APIs to get the compute capability for the current device -- you're welcome to open an issue to track this!
I tried to run boda using cuda backend and apparently something has broken, maybe after migrating to the latest version of cuda(?). Here is the error message:
I am wondering why cuda is generating an invalid ptx output that cannot be run using boda. Do you know the reason?