Closed farhodfm closed 3 years ago
Hi @farhodfm
The only reason that can happen with the code is if you use the --debug
flag. However, I assume that's not the case here.
In order to figure out what happened, can you please elaborate a little bit more?
Thank you for replying, @royorel
You are right! The problem is not related to --debug
as by default it set to be False
.
- Did you get any error messages?
As I remember (I am sorry, I cleaned up the overall process), there was one error stating Too many open files
.
Somehow, I did not pay attention to that as the deeplab model was working just fine.
- How many images were downloaded overall?
Every subfolder contains a different number of images, some of them have less than 20. Overall in 70 subfolders of ffhq_aging256x256 folder, there are more than 700 images.
- The script takes a while to run, so in case you were running it on a remote server, did your connection time out by any chance?
I think, there is no problem with time out since I downloaded the dataset twice. In both cases, I could not get a full dataset.
I just tried to call get_ffhq_aging.sh
in order to get a full running log, but got error 403
.
So, I will try to call it tomorrow.
But, if you have any considerations, I will be glad to discuss it with you.
@farhodfm Can you recreate the error message and share it here? that would give me a hint at what went wrong...
@royorel, I tried to download it again.
Below, I attached the file where you can see error. log.txt
It seems like what's happening is an error from the multithreading python library. Looks like your machine has a limit on the number of files that can be open in parallel.
two possible solutions are:
Please, let me know if this helps
@royorel
However, the following message is presented.
Authentication successful.
authorized access to google drive API!
Downloading JSON metadata...
\ done processing 1/2 filesTraceback (most recent call last):
File "download_ffhq_aging.py", line 374, in <module>
run_cmdline(sys.argv)
File "download_ffhq_aging.py", line 369, in run_cmdline
run(**vars(args))
File "download_ffhq_aging.py", line 333, in run
download_files([json_spec, license_specs['json']], drive=drive, **download_kwargs)
File "download_ffhq_aging.py", line 209, in download_files
raise exc_info[1].with_traceback(exc_info[2])
File "download_ffhq_aging.py", line 219, in _download_thread
pydrive_utils.pydrive_download(drive, spec['file_url'], spec['file_path'])
File "/home/farhod/Documents/FFHQ-Aging-Dataset/pydrive_utils.py", line 40, in pydrive_download
pydrive_file.GetContentFile(save_path)
File "/home/farhod/anaconda3/envs/pytorch/lib/python3.6/site-packages/pydrive/files.py", line 210, in GetContentFile
self.FetchContent(mimetype, remove_bom)
File "/home/farhod/anaconda3/envs/pytorch/lib/python3.6/site-packages/pydrive/files.py", line 43, in _decorated
return decoratee(self, *args, **kwargs)
File "/home/farhod/anaconda3/envs/pytorch/lib/python3.6/site-packages/pydrive/files.py", line 255, in FetchContent
self.content = io.BytesIO(self._DownloadFromUrl(download_url))
File "/home/farhod/anaconda3/envs/pytorch/lib/python3.6/site-packages/pydrive/auth.py", line 75, in _decorated
return decoratee(self, *args, **kwargs)
File "/home/farhod/anaconda3/envs/pytorch/lib/python3.6/site-packages/pydrive/files.py", line 505, in _DownloadFromUrl
raise ApiRequestError('Cannot download file: %s' % resp)
pydrive.files.ApiRequestError: Cannot download file: {'x-guploader-uploadid': 'ABg5-UxUB1kGn4T5glfP_f-xQ__oNXit0o15UJMVoEq1UFND3ct_skVuSlU8jJfgD4F_kLB60xAHyYYgsWFDawGsLNs', 'vary': 'Origin, X-Origin', 'content-type': 'application/json; charset=UTF-8', 'date': 'Fri, 09 Oct 2020 16:37:28 GMT', 'expires': 'Fri, 09 Oct 2020 16:37:28 GMT', 'cache-control': 'private, max-age=0', 'content-length': '320', 'server': 'UploadServer', 'alt-svc': 'h3-Q050=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"', 'status': '403'}
Traceback (most recent call last):
File "run_deeplab.py", line 91, in <module>
main()
File "run_deeplab.py", line 44, in main
assert os.path.isdir(dataset_root)
AssertionError
It seems like you got a 403 error once again (see info in the line that starts with pydrive.files.ApiRequestError
).
This is an error coming from the google drive API. Right now it seems like the quota for downloading the Json file was exceeded. I just tried to download the file manually from the Google drive web interface and got the same error.
In that case, the PyDrive interface won't work either, and the only solution is to wait.
PyDrive is useful when you get an error from the regular script but you're able to download the files manually from the google drive web interface.
I gave another try and this is the result.
Authentication successful.
authorized access to google drive API!
Downloading JSON metadata...
/ done processing 2/2 files
Parsing JSON metadata...
Downloading 70001 files...
- done processing 3894/70001 filesTraceback (most recent call last):
File "/home/farhod/anaconda3/envs/pytorch/lib/python3.6/site-packages/pydrive/files.py", line 237, in FetchMetadata
.execute(http=self.http)
File "/home/farhod/anaconda3/envs/pytorch/lib/python3.6/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/farhod/anaconda3/envs/pytorch/lib/python3.6/site-packages/googleapiclient/http.py", line 907, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 500 when requesting https://www.googleapis.com/drive/v2/files/1aMCLSu17QL1K50o6RepCu3udocQKwD6a?alt=json returned "Internal Error">
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "download_ffhq_aging.py", line 374, in <module>
run_cmdline(sys.argv)
File "download_ffhq_aging.py", line 369, in run_cmdline
run(**vars(args))
File "download_ffhq_aging.py", line 348, in run
download_files(specs, dst_dir, output_size, drive=drive, **download_kwargs)
File "download_ffhq_aging.py", line 209, in download_files
raise exc_info[1].with_traceback(exc_info[2])
File "download_ffhq_aging.py", line 219, in _download_thread
pydrive_utils.pydrive_download(drive, spec['file_url'], spec['file_path'])
File "/home/farhod/Documents/FFHQ-Aging-Dataset/pydrive_utils.py", line 40, in pydrive_download
pydrive_file.GetContentFile(save_path)
File "/home/farhod/anaconda3/envs/pytorch/lib/python3.6/site-packages/pydrive/files.py", line 210, in GetContentFile
self.FetchContent(mimetype, remove_bom)
File "/home/farhod/anaconda3/envs/pytorch/lib/python3.6/site-packages/pydrive/files.py", line 42, in _decorated
self.FetchMetadata()
File "/home/farhod/anaconda3/envs/pytorch/lib/python3.6/site-packages/pydrive/auth.py", line 75, in _decorated
return decoratee(self, *args, **kwargs)
File "/home/farhod/anaconda3/envs/pytorch/lib/python3.6/site-packages/pydrive/files.py", line 239, in FetchMetadata
raise ApiRequestError(error)
pydrive.files.ApiRequestError: <HttpError 500 when requesting https://www.googleapis.com/drive/v2/files/1aMCLSu17QL1K50o6RepCu3udocQKwD6a?alt=json returned "Internal Error">
processed 1/3894 images
processed 2/3894 images
processed 3/3894 images
processed 4/3894 images
processed 5/3894 images
..........................
processed 3890/3894 images
processed 3891/3894 images
processed 3892/3894 images
processed 3893/3894 images
processed 3894/3894 images
PyDrive is useful when you get an error from the regular script but you're able to download the files manually from the google drive web interface.
Do you mean to download original FFHQ and then modify download_ffhq_aging.py
(no downloading, but further pre-processing)?
@farhodfm, I googled "pydrive.files.ApiRequestError: <HttpError 500 when requesting" results seem to indicate that this is some sort of an error in the servers that host the files. It has nothing to do with the download code, I think you should retry downloading and see if the problem persists, I suspect that was something temporary. The downloading script is designed to continue downloading from the place it was stopped.
What I meant is that PyDrive emulates whatever you get when using the web interface. If you get a quota exceeded error (error 403) in the web interface, you will also get it with PyDrive. If you can download the file manually, you will also be able to download it with PyDrive.
That is not the case when using the default setup without pydrive, as quota exceeded errors are returned more often
@royorel
I continued the downloading from the stopped point. I downloaded 377 more images, and again the same problem. I thought I can keep continuing, but "Quota Exceeded Error - 403" is always there. I know this problem is not related to your code and only way is to wait. So, thank you for helping.
Hi @royorel!
Thanks for sharing the full implementation and data!
I downloaded data by following your instructions (using PyDrive). But then, I realized that not all images are downloaded. There are few images in each subfolder ('00000' ~ '69000'). I am sorry if I missed some points. Could you give instructions to download all images, please?
Thanks once again