Closed iriyagupta closed 2 years ago
@Sy-Zhang pls have a check.
Hi,
on running the eval for TACoS I get the following error :
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 190, in h5py.h5o.open KeyError: "Unable to open object (object 's30-d52.avi' doesn't exist)"
I am unsure if this is broken or something, can you please help
Which hdf5 file are you using? and which cloud drive did you download from?
Hi @Sy-Zhang I used tall_c3d_features from this link https://rochester.app.box.com/s/8znalh6y5e82oml2lr7to8s6ntab6mav/folder/137471786054
Hi @Sy-Zhang I used tall_c3d_features from this link https://rochester.app.box.com/s/8znalh6y5e82oml2lr7to8s6ntab6mav/folder/137471786054
I tried and didn't get this error. Could you check whether your hdf5 file is broken?
that is weird, so i used the following steps, changed the name to the file name I downloaded from already trained model in the tacos data yml file like ./checkpoints/TACoS/pretrained_pkl_file and ran moment_localization/test.py. I hope that is the correct method. I kept the .hdf5 feature file in the ./data/TACoS/ folder after downloading from this link. There was merge_npys_to_hdf5.py as well in that folder but it also throws error on running, but I think that is not supposed to be used anyway?
The other thing is I am using nn.DataParallel, do you think that could be an error? @Sy-Zhang
any help would be appreciated
that is weird, so i used the following steps, changed the name to the file name I downloaded from already trained model in the tacos data yml file like ./checkpoints/TACoS/pretrained_pkl_file and ran moment_localization/test.py. I hope that is the correct method. I kept the .hdf5 feature file in the ./data/TACoS/ folder after downloading from this link. There was merge_npys_to_hdf5.py as well in that folder but it also throws error on running, but I think that is not supposed to be used anyway?
The other thing is I am using nn.DataParallel, do you think that could be an error? @Sy-Zhang
any help would be appreciated
Could you try the code shown in this figure to check whether your hdf5 file has 's30-d52.avi'?
Thank you, I checked it exists, I redownloaded the data and ran it, and it seems to load correctly now, however, on using 4 GPUs even just for evaluation it shows
RuntimeError: CUDA out of memory. Tried to allocate 308.00 MiB (GPU 3; 10.92 GiB total capacity; 4.16 GiB already allocated; 72.38 MiB free; 4.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Thank you, I checked it exists, I redownloaded the data and ran it, and it seems to load correctly now, however, on using 4 GPUs even just for evaluation it shows
RuntimeError: CUDA out of memory. Tried to allocate 308.00 MiB (GPU 3; 10.92 GiB total capacity; 4.16 GiB already allocated; 72.38 MiB free; 4.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Then you need reduce the batch size or use GPU with larger GPU memory.
Makes sense, that would be changed in the .yaml file as per my understanding. I will run it again and check.
I guess it was all some broken file issue and some lack of memory from my end. Thank you for your help :)
Hi,
on running the eval for TACoS I get the following error :
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 190, in h5py.h5o.open KeyError: "Unable to open object (object 's30-d52.avi' doesn't exist)"
I am unsure if this is broken or something, can you please help