Open carandraug opened 8 months ago
The file
vggsound.csv
file lists 199467 entries. That number does not match the sum of the test and train files. See$ wc -l data/train.csv data/test.csv 183730 data/train.csv 15446 data/test.csv 199176 total $ wc -l data/vggsound.csv 199467 data/vggsound.csv
The
vggsound.csv
file have an extra 291 entries. The extra entries are in both the train and test split:$ python3 -c 'import csv; [print(x[3]) for x in csv.reader(open("data/vggsound.csv"))]' | sort | uniq -c 15496 test 183971 train
I happen to have a copy of the file
vggsound.csv
as downloaded from the VGG website and these numbers matched.
I checked the full video compression package provided by the author in here and the total number of videos after decompression is 199,176, which is consistent with the number in the training and test files. I think vggsound.csv does have an extra 291 video files.
The file
vggsound.csv
file lists 199467 entries. That number does not match the sum of the test and train files. SeeThe
vggsound.csv
file have an extra 291 entries. The extra entries are in both the train and test split:I happen to have a copy of the file
vggsound.csv
as downloaded from the VGG website and these numbers matched.