talmo / leap

LEAP is now deprecated -- check out its successor SLEAP!
https://sleap.ai
Apache License 2.0
206 stars 48 forks source link

Syntax error on fast train #5

Closed rishabhchandak closed 6 years ago

rishabhchandak commented 6 years ago

Hello,

Training.py ln 99 throws this syntax error when running fast train on the sample dataset:

def train(data_path, *, 
                      ^
SyntaxError: invalid syntax

I am running this on Ubuntu 16.04 with Matlab 2018a on a machine with a GPU.

Thanks!

talmo commented 6 years ago

Hi there,

This sounds like an issue with the Python version that MATLAB is detecting. You'll need to be on Python 3.x to run LEAP. If you have it installed but you're still getting that error, it might not be the one that your system finds when running commands from MATLAB.

If you type this in MATLAB, you should get the path to the Python installation that it's finding:

>> !which python
/tigress/tdp/anaconda3/bin/python

If you have Python 3.x installed in an environment, try activating the env from the commandline and then launching MATLAB from that same terminal (just type matlab) so that it has all the environment variables set.

Let me know if you're still running into any issues!

Talmo

rishabhchandak commented 6 years ago

Hi, Turns out my MATLAB was running Python 2.7 so I fixed that issue and it is on Python 3.5 now. For this, I also modified 'python' to 'python3' in label_joints line 748.Then I got a syntax error for training.py line 124, which I assumed was because of an extra comma before the close parenthesis, so I deleted that. All these changes made the code progress and make a models folder, but then I got this error: Created folder: /home/rishabh/leap/models/180711_095827-n=12018-07-11 09:58:30.737017: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA2018-07-11 09:58:30.882649: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE2018-07-11 09:58:30.882948: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: rishabh-Alienware-Aurora-R72018-07-11 09:58:30.882961: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: rishabh-Alienware-Aurora-R72018-07-11 09:58:30.883012: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 390.67.02018-07-11 09:58:30.883043: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 396.37.02018-07-11 09:58:30.883048: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:303] kernel version 396.37.0 does not match DSO version 390.67.0 -- cannot find working devices in this configurationTraceback (most recent call last):  File "/usr/local/lib/python3.5/dist-packages/numpy/lib/shape_base.py", line 463, in array_split    Nsections = len(indices_or_sections) + 1TypeError: object of type 'numpy.float64' has no len() During handling of the above exception, another exception occurred: Traceback (most recent call last):  File "/home/rishabh/leap/leap/training.py", line 265, in     clize.run(train)  File "/usr/local/lib/python3.5/dist-packages/sigtools/modifiers.py", line 158, in call    return self.func(*args, kwargs)  File "/usr/local/lib/python3.5/dist-packages/clize/runner.py", line 360, in run    ret = cli(args)  File "/usr/local/lib/python3.5/dist-packages/clize/runner.py", line 220, in call    return func(posargs, kwargs)  File "/home/rishabh/leap/leap/training.py", line 209, in train    val_datagen = PairedImageAugmenter(val_box, val_confmap, batch_size=batch_size, shuffle=True, theta=(-rotate_angle, rotate_angle))  File "/home/rishabh/leap/leap/image_augmentation.py", line 60, in init    self.batches = np.array_split(all_idx, np.ceil(self.num_samples / self.batch_size))  File "/usr/local/lib/python3.5/dist-packages/numpy/lib/shape_base.py", line 469, in array_split    raise ValueError('number sections must be larger than 0.')ValueError: number sections must be larger than 0.Exception ignored in: <bound method BaseSession.del of <tensorflow.python.client.session.Session object at 0x7f57b7925668>>Traceback (most recent call last):  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 712, in delTypeError: 'NoneType' object is not callable

As an additional note, I only labeled 1 frame in the sample video and the "initialized" count still shows 0/395/ I am not sure if that is causing the 0 error above. I'm also unsure if MATLAB is finding my GPU - doesn't look like it is.  Thanks,Rishabh

On Tuesday, 10 July, 2018, 3:31:27 PM GMT-5, Talmo Pereira notifications@github.com wrote:

Hi there,

This sounds like an issue with the Python version that MATLAB is detecting. You'll need to be on Python 3.x to run LEAP. If you have it installed but you're still getting that error, it might not be the one that your system finds when running commands from MATLAB.

If you type this in MATLAB, you should get the path to the Python installation that it's finding:

!which python /tigress/tdp/anaconda3/bin/python

If you have Python 3.x installed in an environment, try activating the env from the commandline and then launching MATLAB from that same terminal (just type matlab) so that it has all the environment variables set.

Let me know if you're still running into any issues!

Talmo

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

rishabhchandak commented 6 years ago

Update: Rebooting my computer fixed the CPU/GPU issue: The error now shows: Created folder: /home/rishabh/leap/models/180711_102501-n=12018-07-11 10:25:03.826331: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA2018-07-11 10:25:03.914753: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero2018-07-11 10:25:03.915120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties: name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335pciBusID: 0000:01:00.0totalMemory: 7.93GiB freeMemory: 7.36GiB2018-07-11 10:25:03.915132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 02018-07-11 10:25:04.076893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7106 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)Traceback (most recent call last):  File "/usr/local/lib/python3.5/dist-packages/numpy/lib/shape_base.py", line 463, in array_split    Nsections = len(indices_or_sections) + 1TypeError: object of type 'numpy.float64' has no len()  with the rest of the error same as the previous email.  Thanks,Rishabh

On Wednesday, 11 July, 2018, 10:07:06 AM GMT-5, rishabh chandak chandak_rishabh@yahoo.co.in wrote:

Hi, Turns out my MATLAB was running Python 2.7 so I fixed that issue and it is on Python 3.5 now. For this, I also modified 'python' to 'python3' in label_joints line 748.Then I got a syntax error for training.py line 124, which I assumed was because of an extra comma before the close parenthesis, so I deleted that. All these changes made the code progress and make a models folder, but then I got this error: Created folder: /home/rishabh/leap/models/180711_095827-n=12018-07-11 09:58:30.737017: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA2018-07-11 09:58:30.882649: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE2018-07-11 09:58:30.882948: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: rishabh-Alienware-Aurora-R72018-07-11 09:58:30.882961: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: rishabh-Alienware-Aurora-R72018-07-11 09:58:30.883012: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 390.67.02018-07-11 09:58:30.883043: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 396.37.02018-07-11 09:58:30.883048: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:303] kernel version 396.37.0 does not match DSO version 390.67.0 -- cannot find working devices in this configurationTraceback (most recent call last):  File "/usr/local/lib/python3.5/dist-packages/numpy/lib/shape_base.py", line 463, in array_split    Nsections = len(indices_or_sections) + 1TypeError: object of type 'numpy.float64' has no len() During handling of the above exception, another exception occurred: Traceback (most recent call last):  File "/home/rishabh/leap/leap/training.py", line 265, in     clize.run(train)  File "/usr/local/lib/python3.5/dist-packages/sigtools/modifiers.py", line 158, in call    return self.func(*args, kwargs)  File "/usr/local/lib/python3.5/dist-packages/clize/runner.py", line 360, in run    ret = cli(args)  File "/usr/local/lib/python3.5/dist-packages/clize/runner.py", line 220, in call    return func(posargs, kwargs)  File "/home/rishabh/leap/leap/training.py", line 209, in train    val_datagen = PairedImageAugmenter(val_box, val_confmap, batch_size=batch_size, shuffle=True, theta=(-rotate_angle, rotate_angle))  File "/home/rishabh/leap/leap/image_augmentation.py", line 60, in init    self.batches = np.array_split(all_idx, np.ceil(self.num_samples / self.batch_size))  File "/usr/local/lib/python3.5/dist-packages/numpy/lib/shape_base.py", line 469, in array_split    raise ValueError('number sections must be larger than 0.')ValueError: number sections must be larger than 0.Exception ignored in: <bound method BaseSession.del of <tensorflow.python.client.session.Session object at 0x7f57b7925668>>Traceback (most recent call last):  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 712, in delTypeError: 'NoneType' object is not callable

As an additional note, I only labeled 1 frame in the sample video and the "initialized" count still shows 0/395/ I am not sure if that is causing the 0 error above. I'm also unsure if MATLAB is finding my GPU - doesn't look like it is.  Thanks,Rishabh

On Tuesday, 10 July, 2018, 3:31:27 PM GMT-5, Talmo Pereira notifications@github.com wrote:

Hi there,

This sounds like an issue with the Python version that MATLAB is detecting. You'll need to be on Python 3.x to run LEAP. If you have it installed but you're still getting that error, it might not be the one that your system finds when running commands from MATLAB.

If you type this in MATLAB, you should get the path to the Python installation that it's finding:

!which python /tigress/tdp/anaconda3/bin/python

If you have Python 3.x installed in an environment, try activating the env from the commandline and then launching MATLAB from that same terminal (just type matlab) so that it has all the environment variables set.

Let me know if you're still running into any issues!

Talmo

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

talmo commented 6 years ago

Great -- good to hear you got TensorFlow working with CUDA!

I think this error comes from just doing it with a single (or 0) samples in the training data. You need to make sure that you've set the position for all body parts within the image in order for it to register as "labeled". You can also press the F key to mark all the parts as correctly positioned if you don't need to move some of them.

If you just want to test it out, you can download the labels.mat files with existing annotations for the datasets we provide here: https://github.com/talmo/leap/tree/master/data

GitwellAnyohub commented 6 years ago

I think I am getting a similar type of error when I try the fast train:

180724_leaperrors

talmo commented 6 years ago

Going to go ahead and close this issue since it seems like @rishabhchandak seems to have resolved their problems and @GitwellAnyohub opened a separate issue. Feel free to re-open if needed!