microsoft / computervision-recipes

Best Practices, code samples, and documentation for Computer Vision.
MIT License
9.34k stars 1.16k forks source link

[BUG] URL Issue Image Classification, OD, IS Fridge dataset #692

Open rjaincc opened 2 weeks ago

rjaincc commented 2 weeks ago

Description

Accessing image classification, object detection and instance segmentation fridge datasets using below mentioned URLs leads to below issue intermittently.

Dataset URLs: 1/ https://cvbp-secondary.z19.web.core.windows.net/datasets/image_classification/fridgeObjects.zip 2/ https://cvbp-secondary.z19.web.core.windows.net/datasets/image_classification/multilabelFridgeObjects.zip 3/ https://cvbp-secondary.z19.web.core.windows.net/datasets/object_detection/odFridgeObjects.zip 4/ https://cvbp-secondary.z19.web.core.windows.net/datasets/object_detection/odFridgeObjectsMask.zip

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[3], line 23
     20 data_file = os.path.join(dataset_parent_dir, f"{dataset_name}.zip")
     22 # Download the dataset
---> 23 urllib.request.urlretrieve(download_url, filename=data_file)
     25 # extract files
     26 with ZipFile(data_file, "r") as zip:

File c:\Users\rupaljain\.conda\envs\ft_acft_local_comp\lib\urllib\request.py:247, in urlretrieve(url, filename, reporthook, data)
    230 """
    231 Retrieve a URL into a temporary location on disk.
    232 
   (...)
    243 data file as well as the resulting HTTPMessage object.
    244 """
    245 url_type, path = _splittype(url)
--> 247 with contextlib.closing(urlopen(url, data)) as fp:
    248     headers = fp.info()
    250     # Just return the local path and the "headers" for file://
    251     # URLs. No sense in performing a copy unless requested.

File c:\Users\rupaljain\.conda\envs\ft_acft_local_comp\lib\urllib\request.py:222, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220 else:
    221     opener = _opener
--> 222 return opener.open(url, data, timeout)

File c:\Users\rupaljain\.conda\envs\ft_acft_local_comp\lib\urllib\request.py:531, in OpenerDirector.open(self, fullurl, data, timeout)
    529 for processor in self.process_response.get(protocol, []):
    530     meth = getattr(processor, meth_name)
--> 531     response = meth(req, response)
    533 return response

File c:\Users\rupaljain\.conda\envs\ft_acft_local_comp\lib\urllib\request.py:640, in HTTPErrorProcessor.http_response(self, request, response)
    637 # According to RFC 2616, "2xx" code indicates that the client's
    638 # request was successfully received, understood, and accepted.
    639 if not (200 <= code < 300):
--> 640     response = self.parent.error(
    641         'http', request, response, code, msg, hdrs)
    643 return response

File c:\Users\rupaljain\.conda\envs\ft_acft_local_comp\lib\urllib\request.py:569, in OpenerDirector.error(self, proto, *args)
    567 if http_err:
    568     args = (dict, 'default', 'http_error_default') + orig_args
--> 569     return self._call_chain(*args)

File c:\Users\rupaljain\.conda\envs\ft_acft_local_comp\lib\urllib\request.py:502, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    500 for handler in handlers:
    501     func = getattr(handler, meth_name)
--> 502     result = func(*args)
    503     if result is not None:
    504         return result

File c:\Users\rupaljain\.conda\envs\ft_acft_local_comp\lib\urllib\request.py:649, in HTTPDefaultErrorHandler.http_error_default(self, req, fp, code, msg, hdrs)
    648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649     raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: HTTP Error 404: The requested content does not exist.

In which platform does it happen?

Azure Cluster Run. Sample workflow: https://github.com/Azure/azureml-examples/actions/runs/9501325986

How do we replicate the issue?

One can try running "2.1. Download the Data" section from below notebooks: 1/ https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/automl-standalone-jobs/automl-image-classification-multiclass-task-fridge-items/automl-image-classification-multiclass-task-fridge-items.ipynb 2/ https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/automl-standalone-jobs/automl-image-classification-multilabel-task-fridge-items/automl-image-classification-multilabel-task-fridge-items.ipynb 3/ https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/automl-standalone-jobs/automl-image-object-detection-task-fridge-items-batch-scoring/image-object-detection-batch-scoring-non-mlflow-model.ipynb 4/ https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/automl-standalone-jobs/automl-image-instance-segmentation-task-fridge-items/automl-image-instance-segmentation-task-fridge-items.ipynb

Expected behavior (i.e. solution)

We should not see HTTPError: HTTP Error 404: The requested content does not exist.

Other Comments