Closed tongzhang111 closed 9 months ago
Hey, @tongzhang111! Thanks for your interest in our work. The behaviour you described is a bit surprising and I'm not sure why this happened. Could you please count the number of rows in the .csv file where a valid path is present? Also, what is the number of files inside the images
folder? if the number is in the ballpark of 3M in both cases, then you probably don't need to worry. In this case, I would just modify train.csv
to remove the rows that don't have a path in the first column. But, if there are a lot of paths missing, then we might have to investigate further.
I am very grateful for your reply. Can you compress the data you obtained and release a link. Can I obtain training data through your link?
The data is really big (it usually takes 2-3 days to download) so I don't think it is practically possible for me to host it somewhere. Were you able to check how many files were downloaded?
I found out before that there were indeed a relatively small number of images. I think it may be due to network issues. I will try downloading the data again today
Yes, that is what I would have also suggested. Please feel free to comment again if needed :) Also, in case it wasn't clear, here is how you can download the dataset:
I found that some data did not have corresponding paths in the generated corresponding files(train.csv).