microsoft / VideoX

VideoX: a collection of video cross-modal models
Other
978 stars 161 forks source link

hdf5 broken for TACoS? #75

Closed iriyagupta closed 2 years ago

iriyagupta commented 2 years ago

Hi,

on running the eval for TACoS I get the following error : File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 190, in h5py.h5o.open KeyError: "Unable to open object (object 's30-d52.avi' doesn't exist)"

I am unsure if this is broken or something, can you please help

penghouwen commented 2 years ago

@Sy-Zhang pls have a check.

Sy-Zhang commented 2 years ago

Hi,

on running the eval for TACoS I get the following error : File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 190, in h5py.h5o.open KeyError: "Unable to open object (object 's30-d52.avi' doesn't exist)"

I am unsure if this is broken or something, can you please help

Which hdf5 file are you using? and which cloud drive did you download from?

iriyagupta commented 2 years ago

Hi @Sy-Zhang I used tall_c3d_features from this link https://rochester.app.box.com/s/8znalh6y5e82oml2lr7to8s6ntab6mav/folder/137471786054

Sy-Zhang commented 2 years ago

Hi @Sy-Zhang I used tall_c3d_features from this link https://rochester.app.box.com/s/8znalh6y5e82oml2lr7to8s6ntab6mav/folder/137471786054

I tried and didn't get this error. Could you check whether your hdf5 file is broken?

iriyagupta commented 2 years ago

that is weird, so i used the following steps, changed the name to the file name I downloaded from already trained model in the tacos data yml file like ./checkpoints/TACoS/pretrained_pkl_file and ran moment_localization/test.py. I hope that is the correct method. I kept the .hdf5 feature file in the ./data/TACoS/ folder after downloading from this link. There was merge_npys_to_hdf5.py as well in that folder but it also throws error on running, but I think that is not supposed to be used anyway?

The other thing is I am using nn.DataParallel, do you think that could be an error? @Sy-Zhang

any help would be appreciated

Sy-Zhang commented 2 years ago

that is weird, so i used the following steps, changed the name to the file name I downloaded from already trained model in the tacos data yml file like ./checkpoints/TACoS/pretrained_pkl_file and ran moment_localization/test.py. I hope that is the correct method. I kept the .hdf5 feature file in the ./data/TACoS/ folder after downloading from this link. There was merge_npys_to_hdf5.py as well in that folder but it also throws error on running, but I think that is not supposed to be used anyway?

The other thing is I am using nn.DataParallel, do you think that could be an error? @Sy-Zhang

any help would be appreciated

image Could you try the code shown in this figure to check whether your hdf5 file has 's30-d52.avi'?

iriyagupta commented 2 years ago

Thank you, I checked it exists, I redownloaded the data and ran it, and it seems to load correctly now, however, on using 4 GPUs even just for evaluation it shows

RuntimeError: CUDA out of memory. Tried to allocate 308.00 MiB (GPU 3; 10.92 GiB total capacity; 4.16 GiB already allocated; 72.38 MiB free; 4.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Sy-Zhang commented 2 years ago

Thank you, I checked it exists, I redownloaded the data and ran it, and it seems to load correctly now, however, on using 4 GPUs even just for evaluation it shows

RuntimeError: CUDA out of memory. Tried to allocate 308.00 MiB (GPU 3; 10.92 GiB total capacity; 4.16 GiB already allocated; 72.38 MiB free; 4.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Then you need reduce the batch size or use GPU with larger GPU memory.

iriyagupta commented 2 years ago

Makes sense, that would be changed in the .yaml file as per my understanding. I will run it again and check.

iriyagupta commented 2 years ago

I guess it was all some broken file issue and some lack of memory from my end. Thank you for your help :)