talmo / leap

LEAP is now deprecated -- check out its successor SLEAP!
https://sleap.ai
Apache License 2.0
206 stars 48 forks source link

ChunkSize must be specified #14

Closed kbakhurin closed 5 years ago

kbakhurin commented 5 years ago

Hi Talmo,

I'm encountering a new error I haven't had before: ` Found 28/390 labeled frames. Loaded 28 images [0.24s] Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers. Generated confidence maps [31.41s] 8.75 MB Symmetric channels: Error using h5create>validate_options (line 123) The 'ChunkSize' option must be specified if any extents are to be extendible.

Error in h5create (line 72) options = validate_options(p.Results);

Error in h5save (line 63) h5create(filepath, dset, size(X), 'Datatype', class(X), args{:})

Error in h5savegroup (line 31) h5save(filepath, S.(fns{i}), dset, varargin{:})

Error in generate_training_set (line 212) h5savegroup(savePath,skeleton)

Error in label_joints/fastTrain (line 733) dataPath = generate_training_set(boxPath,'savePath',dataPath,...

Error in label_joints>@(h,~)fastTrain() (line 351) 'Callback', @(h,~)fastTrain(), ...

Error while evaluating UIControl Callback. '

This is what I am inputting to the fast training parameters:

image

Any idea what this could be due to? Let me know if I can provide more information.

Thanks!

Konstantin

talmo commented 5 years ago

Hey Konstantin,

It's possible that it has to do with the body symmetry stuff. Try unchecking Mirror images to see if it still crashes.

If not but you did want to do mirroring for augmentation, be sure that the joints that are left/right symmetric are named "partL"/"partR" in the skeleton so that they're auto-detected appropriately.

Let me know if that works for you or if you're in the latter case!

Cheers,

Talmo

kbakhurin commented 5 years ago

Ah, interesting. In this run I was just using the program to identify a non-symmetrical body part (the nose of a mouse viewed from below) as it moves in 2 dimensions. Does the algorithm need skeleton parts that have left and right designations?

I tried unchecking the Mirror images option and I had the same problem. I also tried reducing the number of clusters and samples per cluster that I saved in the cluster_sample program but that didn't help.

Thank you for your help, Konstantin

talmo commented 5 years ago

Aha, nope, if you don't care about mirror augmentation you don't have to worry about it.

I recall seeing this problem before though I'm not sure what in particular is special about the skeleton that leads to this issue.

As a quick and dirty fix, try making sure your h5savegroup.m function matches the current one at: https://github.com/talmo/leap/blob/master/leap/toolbox/hdf5/h5savegroup.m

You can quickly edit it in MATLAB by typing edit h5savegroup and then just copy paste the contents of that file above.

If you don't mind, could you also send me your skeleton MAT file so I debug this issue? (You can email it to me or post a link)

kbakhurin commented 5 years ago

Hi Talmo,

Thanks for the reply. I checked my h5savegroup.m file and it does not match the code that is in the link. I tried to run the fetch in my github desktop app to update it, but that specific file (h5savegroup.m) did not update. I thought that was how one updates github files on a desktop, but I may be wrong. How exactly do I get your updates into my computer as you make them? That is the whole idea with GitHub, right?

I went ahead and updated the file based on your instructions. It got rid of the error above, but I encountered a new one. It finished one epoch then gave this:

Traceback (most recent call last): File "C:\Users\Yinlab\Documents\GitHub\leap\leap\training.py", line 272, in <module> clize.run(train) File "C:\Users\Yinlab\Anaconda3\lib\site-packages\sigtools\modifiers.py", line 158, in __call__ return self.func(*args, **kwargs) File "C:\Users\Yinlab\Anaconda3\lib\site-packages\clize\runner.py", line 360, in run ret = cli(*args) File "C:\Users\Yinlab\Anaconda3\lib\site-packages\clize\runner.py", line 220, in __call__ return func(*posargs, **kwargs) File "C:\Users\Yinlab\Documents\GitHub\leap\leap\training.py", line 251, in train viz_grid_callback File "C:\Users\Yinlab\AppData\Roaming\Python\Python36\site-packages\keras\legacy\interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "C:\Users\Yinlab\AppData\Roaming\Python\Python36\site-packages\keras\engine\training.py", line 2280, in fit_generator callbacks.on_epoch_end(epoch, epoch_logs) File "C:\Users\Yinlab\AppData\Roaming\Python\Python36\site-packages\keras\callbacks.py", line 77, in on_epoch_end callback.on_epoch_end(epoch, logs) File "C:\Users\Yinlab\Documents\GitHub\leap\leap\training.py", line 229, in <lambda> viz_pred_callback = LambdaCallback(on_epoch_end=lambda epoch, logs: show_pred(model, *viz_sample, save_path=os.path.join(run_path, "viz_pred/pred_%03d.png" % epoch), show_figure=False)) File "c:\users\yinlab\documents\github\leap\leap\viz.py", line 58, in show_pred plt.imshow(Y2[:,:,joint_idx], alpha=alpha_pred) IndexError: too many indices for array

here is a link attached a link to my skeleton file.

https://duke.box.com/s/qvpihdop8rkvt7g4e9sjxz5d39bke7we

apologies for the scattered questions

Konstantin

talmo commented 5 years ago

Whoa, very weird! Haven't seen that one before. I'm still inclined to believe there's something curious about this skeleton.

It seems that I can't access the Box link without a Box Business account -- do you mind sharing a public link or uploading the skeleton MAT file to justbeamit.com or Google Drive?

kbakhurin commented 5 years ago

OK, does this link work?

https://drive.google.com/open?id=16DyzH77B2gl5k2Kn5YFsZnZtH9qBwpyo

talmo commented 5 years ago

Hey Konstantin,

So I tried to reproduce your error by loading the skeleton file you provided to train on an example video, but I don't seem to be running into any issues (???).

Some ideas that come to mind for troubleshooting:

  1. Just to confirm, before seeing the error, you see the progress being printed in the Command Window, correct?

  2. Can you try pulling the most recent version from the repository? Typically you'd just do a git pull or by clicking Fetch origin in the GitHub desktop app, but let's be sure we're at the exactly current version by typing this in MATLAB:

    >> !git fetch --all && git reset --hard origin/master

If it still doesn't work:

  1. Let's check some versions. Type this out in MATLAB and let me know the results:

    >> !python -c "import tensorflow as tf; print(tf.__version__)"
    >> !python -c "import keras; print(keras.__version__)"
    >> !python -c "import matplotlib; print(matplotlib.__version__)"
  2. Let's check the input data. Could you run h5disp('box.h5') (replacing box.h5 with your filename) and paste the output here?

  3. If you don't mind, you could email me (talmo@princeton.edu) your movie file and the labels.mat file that you're using to train and I'll try to reproduce the error again.

We'll get it working :)

Talmo

talmo commented 5 years ago

Had a chance to take another look at this issue -- I'm pretty sure that this should be solved by updating to the newest version. The last pull request I merged (7ebc1bb0afedcf7c38e95c1fd5d3528a78a4b560) actually had a solution for what I think is exactly your problem.

Give it a go using the instructions above or by just deleting and downloading the whole repo from scratch again. Should do the trick, but let me know if you're still having issues!

kbakhurin commented 5 years ago

Hi Talmo, yes you are right, it works once I updated everything to its current version.

Thanks :)

Konstantin