v-iashin / BMT

Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
https://v-iashin.github.io/bmt
MIT License
226 stars 57 forks source link

RuntimeError: Vector for token b'27ll' has 38 dimensions, but previously read vectors have 300 dimensions #34

Closed xanthan011 closed 3 years ago

xanthan011 commented 3 years ago

Hello Vladimir,

I'm trying to run your repository on Google Colab and I'm facing this error which I hope you can address it and give me some insight on the same.

I even have read your readme file but I'm trying to replicate the instructions mentioned in this article

So when I'm trying to run this command after activating the bmt environment:

python ./sample/single_video_prediction.py \
    --prop_generator_model_path /content/BMT/sample/best_prop_model.pt \
    --pretrained_cap_model_path /content/BMT/best_cap_model.pt.1  \
    --vggish_features_path /content/BMT/test/y2mate_vggish.npy \
    --rgb_features_path /content/BMT/test/y2mate_rgb.npy \
    --flow_features_path /content/BMT/test/y2mate_flow.npy \
    --duration_in_secs 99 \
    --device_id 0 \
    --max_prop_per_vid 100 \
    --nms_tiou_thresh 0.4

This error shows up:

Contructing caption_iterator for "train" phase
100%|█████████▉| 753786/753787 [01:21<00:00, 9194.76it/s]
Traceback (most recent call last):
  File "./sample/single_video_prediction.py", line 279, in <module>
    cap_cfg, cap_model, train_dataset = load_cap_model(args.pretrained_cap_model_path, args.device_id)
  File "./sample/single_video_prediction.py", line 136, in load_cap_model
    train_dataset = ActivityNetCaptionsDataset(cfg, 'train', get_full_feat=False)
  File "/content/BMT/sample/../datasets/captioning_dataset.py", line 310, in __init__
    self.train_vocab, self.caption_loader = caption_iterator(cfg, self.batch_size, self.phase)
  File "/content/BMT/sample/../datasets/captioning_dataset.py", line 40, in caption_iterator
    CAPTION.build_vocab(dataset.caption, min_freq=cfg.min_freq_caps, vectors=cfg.word_emb_caps)
  File "/usr/local/envs/bmt/lib/python3.7/site-packages/torchtext/data/field.py", line 273, in build_vocab
    self.vocab = self.vocab_cls(counter, specials=specials, **kwargs)
  File "/usr/local/envs/bmt/lib/python3.7/site-packages/torchtext/vocab.py", line 88, in __init__
    self.load_vectors(vectors, unk_init=unk_init, cache=vectors_cache)
  File "/usr/local/envs/bmt/lib/python3.7/site-packages/torchtext/vocab.py", line 147, in load_vectors
    vectors[idx] = pretrained_aliases[vector](**kwargs)
  File "/usr/local/envs/bmt/lib/python3.7/site-packages/torchtext/vocab.py", line 401, in __init__
    super(GloVe, self).__init__(name, url=url, **kwargs)
  File "/usr/local/envs/bmt/lib/python3.7/site-packages/torchtext/vocab.py", line 280, in __init__
    self.cache(name, cache, url=url, max_vectors=max_vectors)
  File "/usr/local/envs/bmt/lib/python3.7/site-packages/torchtext/vocab.py", line 361, in cache
    dim))
RuntimeError: Vector for token b'27ll' has 38 dimensions, but previously read vectors have 300 dimensions. All vectors must have the same number of dimensions.

I'm not sure why this issue is coming and I have tried everything in my power to solve it.

For reproducing this issue, here is the colab file. You can run it and reproduce the issue . Just to be clear, many improvisations had to be done after following the article to run this repository, but I'm stuck at the very last step and I hope you can help me.

I have also attached a video while on which we are running, you can directly upload it in test folder and you wouldn't have to change any paths in any cells then (if you wish not to).

https://user-images.githubusercontent.com/69425158/130767285-b58d4aca-0ee4-4007-9e0f-2bad4c5008b8.mp4

v-iashin commented 3 years ago

Hi, @xanthan011

Thanks for submitting the issue.

Google Colab environment is not supported. I decided to check it out because I have seen this error before in #21 that was troublesome to debug without a mwe which, admittedly, you provided. However, again, there is nothing wrong with this code, it is a problem with the environment you are trying to use.

Anyway, the main problem is the lack of disk space on Google Colab. When you run https://github.com/v-iashin/BMT/blob/d45ad8f11d35bc7d1757ac1d735952384b4fa4c1/datasets/captioning_dataset.py#L40 it unpacks the pre-trained glove zip which expands dramatically in size + torchtext resaves it (2.1G -> 5.3G + 2.6G). Since there is no disk space, only a part of the file is saved on the disk. Google Colab shows no error and continues execution.

You need to remove something to fix it.

  1. I removed the second captioning checkpoint (best_cap_model.pt.1) which you are mistakingly downloading for the second time.
  2. I removed the extracted features as you don't need these for the single video example. These features are downloaded and unpacked in https://github.com/v-iashin/BMT/blob/d45ad8f11d35bc7d1757ac1d735952384b4fa4c1/download_data.sh#L1-L62
    %%bash
    rm -r /content/BMT/data/i3d_25fps_stack64step64_2stream_npy
  3. Run conda clean --all after installing conda environments because conda caches library tarballs after installing packages which allocate lots of space usually. After doing so step 2 should be optional.

There is another error in your code:

conda install -c conda-forge spacy
python -m spacy download en

Likely, it does not do what you expect it to do. The python here is aliased with the Colab's internal default Python library, not conda's so you are installing the language model there. Plus, spacy is already installed in the bmt environment – you only need to install the language model. You can do it as follows instead

%%bash
source activate bmt
/usr/local/envs/bmt/bin/python -m spacy download en

I managed to run the notebook after these fixes on ./sample/women_long_jump.mp4.

I am planning to add a Google Colab notebook to the repo which will support the single video example. If you want to author a PR, please create a .ipynb notebook and I will merge it. If you will decide to do so, please clean and reorganize your current version a bit.


I'm trying to replicate the instructions mentioned in this article Just to be clear, many improvisations had to be done after following the article to run this repository, but I'm stuck at the very last step and I hope you can help me.

Just to be clear, I didn't write this article you need to improvise upon but I wrote the paper and released this code which that article uses. By the way, I checked your colab file and there is no evidence of the improvisation. It just copies stuff from the README.md which I provided already and tries to stick it to the Google Colab environment.

I EVEN have read your readme file

This is cute!

You need to understand that this code was not designed to run in Google Colab but you are still trying to run it there and ask for my help. I don't have to help you here and even release this repo to the public. Please be polite not only in how you write but also to others' time. Just because you did not manage to debug your work properly, right now you are asking me to do it which shows exactly the opposite.

xanthan011 commented 3 years ago

Hello Vladimir,

Firstly, thank your for the response.

Google Colab environment is not supported. I decided to check it out because I have seen this error before in #21 that was troublesome to debug without a mwe which, admittedly, you provided. However, again, there is nothing wrong with this code, it is a problem with the environment you are trying to use.

yes, I knew this before I created a colab file, but on my local machine cuda isn't supported, so I had no choice.

it unpacks the pre-trained glove zip which expands dramatically in size + torchtext resaves it (2.1G -> 5.3G + 2.6G). Since there is no disk space, only a part of the file is saved on the disk. Google Colab shows no error and continues execution. I'm wondering if having Colab Pro would help in this case. Please give your thoughts

So, even I had a 60% guess that this might be the issue as colab did throw a warning saying that the disk is full and I wasn't able to locate the .vector_cache folder.

I removed the second captioning checkpoint (best_cap_model.pt.1) which you are mistakingly downloading for the second time.

Actually this wasn't a mistake. Somehow the best_cap_model.pt file was getting corrupted when I ran the last cell. It threw an error eof expected or corrupt file . Idk why but I thought the best solution was to replace it by downloading a new one.

Just to be clear, I didn't write this article you need to improvise upon but I wrote the paper and released this code which that article uses. By the way, I checked your colab file and there is no evidence of the improvisation. It just copies stuff from the README.md which I provided already and tries to stick it to the Google Colab environment.

I totally understand. I saw the author of the article beforehand. The only reason to mention the article so that you might have an idea why the structure of the notebook is different than your readme and by improvisation, I meant in the structure of the code and few minor bits here and there. Nothing new that will contribute.

You need to understand that this code was not designed to run in Google Colab but you are still trying to run it there and ask for my help. I don't have to help you here and even release this repo to the public. Please be polite not only in how you write but also to others' time. Just because you did not manage to debug your work properly, right now you are asking me to do it which shows exactly the opposite.

Firstly, I have the utmost respect of the time you and every developer out there who provides the code for their papers. I never indented to waste your time as I already knew that probably there nothing was wrong with the code, but I was really stuck so I thought its worth mentioning as only one similar issue #21 was there. Secondly, I mentioned that I read your readme so that you have an idea that I have gone through your explanation of the code and tried everything else present there.

Thirdly, somehow github exaggerated by line, it was suppose to be this:

I even have read your readme file

but somehow it capitalized the word to make it look dramatic. 😂. But anyways, I have respect of your time and I appreciate that you took some time to go through my problem.

I hope now there is no misunderstanding. I understand your duty only extends till solving the errors to the specifications and conditions you provide in which the code will work, and as a developer myself, I respect that

I am planning to add a Google Colab notebook to the repo which will support the single video example. If you want to author a PR, please create a .ipynb notebook and I will merge it. If you will decide to do so, please clean and reorganize your current version a bit.

If I'm able to make this notebook run successfully then sure, I would love to contribute back, after all the effort is all yours.

v-iashin commented 3 years ago

somehow it capitalized the word

I capitalized the word to point out what bothered me there.

xanthan011 commented 3 years ago

Hello vladimir, I was able to run the colab file So thank you for the suggestion.

it unpacks the pre-trained glove zip which expands dramatically in size + torchtext resaves it (2.1G -> 5.3G + 2.6G)

I wanted to ask that if this 5.3G + 2.6G addition to the disk space will happen every time when we input a different video or is it a one time thing?

I am planning to add a Google Colab notebook to the repo which will support the single video example. If you want to author a PR, please create a .ipynb notebook and I will merge it. If you will decide to do so, please clean and reorganize your current version a bit.

And now since the colab file is running, I will clean and add texts to the cells in the file for explanation of the code in the cell and send a merge request in a couple of days.

v-iashin commented 3 years ago

Hi,

I haven't looked into this much but what torchtext does is it checks if the GloVe model is in .vector_cache and if not, downloads it from its own servers, if it is present (*.txt or *.txt.pt – I don't know) it will load it from the disk.

If you are curious, it is triggered by specifying cfg.word_emb_caps='glove.840B.300d' https://github.com/v-iashin/BMT/blob/18b5ee109ac48e6023bad5d8c52b3a9c9c53e86f/datasets/captioning_dataset.py#L40 This might remind you the way torchvision.models.* are initialized.

Since the default server is quite slow, we mirrored the pre-trained GloVe model on our premises, so you can download it with high speed. This is just to give you a rough idea of what is happening there.

I am glad it worked out for you. Looking forward to seeing your PR.

xanthan011 commented 3 years ago

Hello Vladimir,

I wanted to ask whether the repository works on short videos as I'm facing a problem.

You see, I wanted to get the captions from a gif (converted into an mp4 and then ran on the code).

So , while running this:

python main.py \
    --feature_type i3d \
    --on_extraction save_numpy \
    --device_ids 0 \
    --extraction_fps 25 \
    --video_paths/content/BMT/test/1419275450975539203.mp4 \
    --output_path /content/BMT/test 

It gives the rgb.npy and flow.npy files as output. But when I run this:

python main.py \
    --feature_type vggish \
    --on_extraction save_numpy \
    --device_ids 0 \
    --video_paths/content/BMT/test/1419275450975539203.mp4 \
    --output_path /content/BMT/test

There is no vggish.npy file as output which usually comes. So, I'm wondering if this an issue of the video being small or colab issue (seems unlikely).

I will also attach a 2 gif (in .mp4 format) files for reproducing.

Thank you

https://user-images.githubusercontent.com/69425158/131094986-8a3cb7a9-8a53-4cb3-9337-c5710f06e5ba.mp4

https://user-images.githubusercontent.com/69425158/131095061-22ab5240-a3d6-4a9b-bfd8-3342697f9c57.mp4

v-iashin commented 3 years ago

😅

VGGish extracts audio features.

By the way, PR with a notebook that uses the sample video (./sample) would do just fine.

xanthan011 commented 3 years ago

VGGish extracts audio features.

ohh okay then my bad.

Then can you please give me an idea how should I run without that file as removing --vggish_features_path flag throws an error while running ./sample/single_video_prediction.py and since there is no vggish.npy file, I can't give it 😅?