voxel51 / fiftyone

Refine high-quality datasets and visual AI models
https://fiftyone.ai
Apache License 2.0
8.84k stars 558 forks source link

[BUG] foz.load_zoo_dataset not working - BadZipFile: File is not a zip file #4083

Open Guydada opened 8 months ago

Guydada commented 8 months ago

Describe the problem

When trying to load datasets using foz.load_zoo_dataset, a BadZipFile: File is not a zip file error raises.

Code to reproduce issue

to reproduce simply:

import fiftyone.zoo as foz
ds = foz.load_zoo_dataset("quickstart")

commands and/or screenshots here

{
    "name": "BadZipFile",
    "message": "File is not a zip file",
    "stack": "---------------------------------------------------------------------------
BadZipFile                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 ds = foz.load_zoo_dataset(\"quickstart\")

File ~/miniconda3/envs/shooter/lib/python3.10/site-packages/fiftyone/zoo/datasets/__init__.py:258, in load_zoo_dataset(name, split, splits, label_field, dataset_name, dataset_dir, download_if_necessary, drop_existing_dataset, persistent, overwrite, cleanup, progress, **kwargs)
    253     zoo_dataset_cls = _get_zoo_dataset_cls(name)
    254     download_kwargs, _ = fou.extract_kwargs_for_class(
    255         zoo_dataset_cls, kwargs
    256     )
--> 258     info, dataset_dir = download_zoo_dataset(
    259         name,
    260         splits=splits,
    261         dataset_dir=dataset_dir,
    262         overwrite=overwrite,
    263         cleanup=cleanup,
    264         **download_kwargs,
    265     )
    266     zoo_dataset = info.get_zoo_dataset()
    267 else:

File ~/miniconda3/envs/shooter/lib/python3.10/site-packages/fiftyone/zoo/datasets/__init__.py:170, in download_zoo_dataset(name, split, splits, dataset_dir, overwrite, cleanup, **kwargs)
    135 \"\"\"Downloads the dataset of the given name from the FiftyOne Dataset Zoo.
    136 
    137 Any dataset splits that already exist in the specified directory are not
   (...)
    165     -   dataset_dir: the directory containing the dataset
    166 \"\"\"
    167 zoo_dataset, dataset_dir = _parse_dataset_details(
    168     name, dataset_dir, **kwargs
    169 )
--> 170 return zoo_dataset.download_and_prepare(
    171     dataset_dir=dataset_dir,
    172     split=split,
    173     splits=splits,
    174     overwrite=overwrite,
    175     cleanup=cleanup,
    176 )

File ~/miniconda3/envs/shooter/lib/python3.10/site-packages/fiftyone/zoo/datasets/__init__.py:1133, in ZooDataset.download_and_prepare(self, dataset_dir, split, splits, overwrite, cleanup)
   1124 else:
   1125     logger.info(
   1126         \"Downloading dataset to '%s'%s\", dataset_dir, suffix
   1127     )
   1129 (
   1130     dataset_type,
   1131     num_samples,
   1132     classes,
-> 1133 ) = self._download_and_prepare(dataset_dir, scratch_dir, None)
   1135 if self.supports_partial_downloads and num_samples is None:
   1136     logger.info(\"Existing download is sufficient\")

File ~/miniconda3/envs/shooter/lib/python3.10/site-packages/fiftyone/zoo/datasets/base.py:2934, in QuickstartDataset._download_and_prepare(self, dataset_dir, scratch_dir, _)
   2933 def _download_and_prepare(self, dataset_dir, scratch_dir, _):
-> 2934     _download_and_extract_archive(
   2935         self._GDRIVE_ID,
   2936         self._ARCHIVE_NAME,
   2937         self._DIR_IN_ARCHIVE,
   2938         dataset_dir,
   2939         scratch_dir,
   2940     )
   2942     logger.info(\"Parsing dataset metadata\")
   2943     dataset_type = fot.FiftyOneDataset()

File ~/miniconda3/envs/shooter/lib/python3.10/site-packages/fiftyone/zoo/datasets/base.py:3261, in _download_and_extract_archive(fid, archive_name, dir_in_archive, dataset_dir, scratch_dir)
   3258     logger.info(\"Using existing archive '%s'\", archive_path)
   3260 logger.info(\"Extracting dataset...\")
-> 3261 etau.extract_archive(archive_path)
   3262 _move_dir(os.path.join(scratch_dir, dir_in_archive), dataset_dir)

File ~/miniconda3/envs/shooter/lib/python3.10/site-packages/eta/core/utils.py:3590, in extract_archive(archive_path, outdir, delete_archive)
   3577 \"\"\"Extracts the contents of an archive.
   3578 
   3579 The following formats are supported:
   (...)
   3587         default, this is False
   3588 \"\"\"
   3589 if archive_path.endswith(\".zip\"):
-> 3590     extract_zip(archive_path, outdir=outdir, delete_zip=delete_archive)
   3591 elif archive_path.endswith((\".tar\", \".tar.gz\", \".tgz\", \".tar.bz\", \".tbz\")):
   3592     extract_tar(archive_path, outdir=outdir, delete_tar=delete_archive)

File ~/miniconda3/envs/shooter/lib/python3.10/site-packages/eta/core/utils.py:3616, in extract_zip(zip_path, outdir, delete_zip)
   3605 \"\"\"Extracts the contents of a .zip file.
   3606 
   3607 Args:
   (...)
   3612         this is False
   3613 \"\"\"
   3614 outdir = outdir or os.path.dirname(zip_path) or \".\"
-> 3616 with zf.ZipFile(zip_path, \"r\", allowZip64=True) as f:
   3617     f.extractall(outdir)
   3619 if delete_zip:

File ~/miniconda3/envs/shooter/lib/python3.10/zipfile.py:1269, in ZipFile.__init__(self, file, mode, compression, allowZip64, compresslevel, strict_timestamps)
   1267 try:
   1268     if mode == 'r':
-> 1269         self._RealGetContents()
   1270     elif mode in ('w', 'x'):
   1271         # set the modified flag so central directory gets written
   1272         # even if no files are added to the archive
   1273         self._didModify = True

File ~/miniconda3/envs/shooter/lib/python3.10/zipfile.py:1336, in ZipFile._RealGetContents(self)
   1334     raise BadZipFile(\"File is not a zip file\")
   1335 if not endrec:
-> 1336     raise BadZipFile(\"File is not a zip file\")
   1337 if self.debug > 1:
   1338     print(endrec)

BadZipFile: File is not a zip file"
}

System information

Other info/logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Please do not use screenshots for sharing text. Code snippets should be used instead when providing tracebacks, logs, etc.

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the FiftyOne codebase?

swheaton commented 8 months ago

Hmm this should've been fixed in the last patch release which you're saying you have. Can you give the version of the "voxel51-eta" package?

pip show voxel51-eta

Guydada commented 8 months ago

Thanks for the quick answer It's:

Name: voxel51-eta
Version: 0.12.5
Summary: Extensible Toolkit for Analytics
Home-page: https://github.com/voxel51/eta
swheaton commented 8 months ago

@Guydada Thanks for the report. You're on the latest version. It appears that we're getting rate-limited by Google Drive which is where we store some of the assets for the dataset zoo like quickstart. We'll look into resolution but unfortunately in the short term we don't have much recourse besides waiting up to 24 hours like the Google error message (screenshot below) says 😞 If this is your first experience with fiftyone, I apologize, this isn't normal! In the meantime you could check out our publicly-hosted version of the fiftyone app which has the quickstart dataset and many others pre-loaded, https://try.fiftyone.ai/ image

Guydada commented 8 months ago

Thanks for the update. Since I have been with you from the very early versions of FiftyOne and since I really like FiftyOne, this is absolutely understandable. Thanks again. We can close it.

swheaton commented 8 months ago

I would like to keep this open as a reminder of the issue if that's ok

swheaton commented 8 months ago

@Guydada I am able to download the dataset again, if you want to try again. Still looking into a longer term solution for it so will keep this issue alive still

hanqi-monarch commented 2 months ago

Similar issue with me. I did a fiftyone quickstart on cli some hours ago once. Now trying to load dataset = foz.load_zoo_dataset("quickstart-3d") following the sample code at https://docs.voxel51.com/user_guide/using_datasets.html#orthographic-projection-images gives BadZipFile: File is not a zip file.

If due to rate limit, is that per client user or per data source? That seems really strict since I assume the cli fiftyone quickstart downloading 200 images isn't much

swheaton commented 2 months ago

I believe it is per data source across all users.

I was able to download the file just now. Have you tried again after some time?

hanqi-monarch commented 2 months ago

@swheaton I just tried again by entering url of quickstart-3d directly https://drive.usercontent.google.com/download?id=1EnQ2-gGDktEd8pAWwdXNK-FeHUFTFl5K and it shows network failures on both chrome and edge.

swheaton commented 2 months ago

Interesting, I downloaded through that link just now with no problems. Just had to click the button saying I accept no virus scan. Maybe your network failures are different than the rate limit. Can you share content of the network tab ?

hanqi-monarch commented 1 month ago

I can't reproduce the network error from clicking google drive link because it worked few hours later then, and works now too.