rese1f / MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
https://rese1f.github.io/MovieChat/
BSD 3-Clause "New" or "Revised" License
531 stars 41 forks source link

Mismatch between the `.json` and `.tar` files in MovieChat-1K_train dataset #55

Open LZHgrla opened 6 months ago

LZHgrla commented 6 months ago

Hi @Espere-1119-Song

I found some pairing issues between the JSON and TAR files in the MovieChat-1K_train dataset.

There are a total of 830 JSON files (json.txt) and 769 TAR files (tar.txt). They are mismatched. I checked and found that there are 74 missing TAR files (tar_missing.txt) and 13 extra TAR files (tar_extra.txt).

Additionally, there seem to be issues with AWB-8.tar and earth9-2.tar files in HuggingFace hub, possibly due to the compression or upload failure. (AWB-8.tar is an extra TAR file and can be deleted directly, while earth9-2.tar should be considered for re-uploading)

Espere-1119-Song commented 6 months ago

Thanks for the reminder, I will resolve this issue as soon as possible.

LZHgrla commented 6 months ago

Hi, @Espere-1119-Song

We found another two invalid tar file: movies/s01e08-1.tar (10.6 GB), movies/S01E2-4.tar (6.24 GB)

Espere-1119-Song commented 6 months ago

thanks, we are hurry to upload them

Espere-1119-Song commented 6 months ago

We upload the raw videos of the training set :)