Closed Jarrome closed 4 years ago
Hi Jarrome,
I’m not sure what might be causing the problem. I’m suspecting is the custom tensorflow ops though. Are the ops compiled using the same cuda version?
Zi Jian
On Wed, 4 Dec 2019 at 9:56 PM, Jarrome notifications@github.com wrote:
When I run inference.sh, I got
gpu: 0 output_dir: ./example_data/results num_samples: 64 checkpoint: ./ckpt/checkpoint.ckpt base_scale: 2.0 data_dir: ./example_data num_points: -1 model: 3DFeatNet feature_dim: 32 randomize_points: True use_keypoints_from: None data_dim: 6 max_keypoints: 1024 min_response_ratio: 0.01 nms_radius: 0.5
2019-12-04 14:52:33,039 [DEBUG] main - In compute_descriptors() 2019-12-04 14:52:33,039 [INFO] main - Computed descriptors will be saved to ./example_data/results 2019-12-04 14:52:33,039 [INFO] main - Found 4 bin files in directory: ./example_data, each assumed to be of dim 6 2019-12-04 14:52:33,039 [INFO] Feat3dNet - Model parameters: {'num_samples': 64, 'NoRegress': False, 'Attention': True, 'BaseScale': 2.0, 'feature_dim': 32, 'num_clusters': -1} Segmentation fault
With
Python 3.5.3 tensorflow-gpu 1.14.0 cuda 10.0
What might be the problem?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/yewzijian/3DFeatNet/issues/11?email_source=notifications&email_token=ADIBP67XLGEOY7REJ2FG3F3QW6ZJ5A5CNFSM4JVJAIXKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H6AL7NQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIBP64RAPRF5LW4BFX27ATQW6ZJ5ANCNFSM4JVJAIXA .
Yes, with cuda-10.0. I will try inference.sh back with my laptop (cuda-8 probably), perhaps will not raise the segfault.
Ok, then that’s weird. I currently away and don’t have access to my computer these few days. I suggest trying with an older version of Tensorflow. Perhaps something broke in the new version.
Will test the code out on TF1.14 when I get back next week.
Zi Jian
On Wed, 4 Dec 2019 at 10:16 PM, Jarrome notifications@github.com wrote:
Yes, with cuda-10.0
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/yewzijian/3DFeatNet/issues/11?email_source=notifications&email_token=ADIBP6Z4MXC7WWRMW5YTHETQW63SVA5CNFSM4JVJAIXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF5FCHA#issuecomment-561664284, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIBP62WAZXGLA6XN4XF4WDQW63SVANCNFSM4JVJAIXA .
Hi Zi Jian,
My pc is with python 3.6.8, tf-gpu1.11 and cuda9.0 and it works well, except for the resource issue ;)
While the previous tf 1.14, I dont know, but other version is not compatible with cuda10.0.
Hi Jarrome,
Yes, you're right, the code segfaults when running the custom ops on Tensorflow 1.14 (but weirdly seems to run fine on TF1.15).
I have no idea how to fix this, sorry. My recommendation is to stick with an older version of Tensorflow.
Thank you, Zi Jian. I appreciate your help ;)
I changed to another system and it finally runs smoothly.
Here is the setting:
GeForce RTX 2080 Ti Driver Version: 418.74 Cuda 10.0 (seems cuda 10.1 not compatible to tf-gpu) tensorflow-gpu 1.13.1
Then for tensorflow.python.framework.errors_impl.NotFoundError , follow issue of original repo of tf_op, uncommen -D_GLIBCXX_USE_CXX11_ABI=0
When I run inference.sh, I got
With
What might be the problem?