googlecolab / colabtools

Python libraries for Google Colaboratory
Apache License 2.0
2.17k stars 706 forks source link

files.download support for directories #122

Open pointyointment opened 6 years ago

pointyointment commented 6 years ago

I was trying to download the configuration so I could just reupload it in the future instead of going through all of the OAuth/API key steps every time I run the notebook.

Typical google-drive-ocamlfuse setup:

!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

!mkdir drive
!google-drive-ocamlfuse drive

That all succeeds.

Attempt to download .gdfuse folder using the technique shown in the external data notebook:

import os
from google.colab import files

os.chdir(os.path.expanduser('~'))  # just in case we were elsewhere
!ls -a  # I can see .gdfuse listed
files.download('.gdfuse')

That gives this output:

.   .cache   datalab  .forever  .ipython  .rnd
..  .config  drive    .gdfuse   .local
---------------------------------------------------------------------------
MessageError                              Traceback (most recent call last)
<ipython-input-18-eac250f532e0> in <module>()
      3 os.chdir(os.path.expanduser('~'))
      4 get_ipython().system('ls -a')
----> 5 files.download('.gdfuse')

/usr/local/lib/python3.6/dist-packages/google/colab/files.py in download(filename)
    170       'port': port,
    171       'path': os.path.abspath(filename),
--> 172       'name': os.path.basename(filename),
    173   })

/usr/local/lib/python3.6/dist-packages/google/colab/output/_js.py in eval_js(script, ignore_result)
     37   if ignore_result:
     38     return
---> 39   return _message.read_reply_from_input(request_id)
     40 
     41 

/usr/local/lib/python3.6/dist-packages/google/colab/_message.py in read_reply_from_input(message_id, timeout_sec)
     84         reply.get('colab_msg_id') == message_id):
     85       if 'error' in reply:
---> 86         raise MessageError(reply['error'])
     87       return reply.get('data', None)
     88 

MessageError: Error: Failed to download: 

There's no further explanation given for the failure, though it looks like there should be. I tried to search for other people having this problem and didn't find any. There is Issue #83, but the problem there was determined to be that the file was too large, which a configuration folder shouldn't be.

craigcitro commented 6 years ago

files.download doesn't understand what to do with a folder. for now, tar it up and you should be okay.

let's use this issue to track a better error in that case.

Androbin commented 6 years ago

Solution 1: Error out

if os.path.isdir(filename):
  raise ValueError("filename: must not be a directory")

Pro: simple Contra: useless

Solution 2: Download one by one

if os.path.isdir(filename):
  directory = filename
  for filename in os.listdir():
    download(os.path.join(directory, filename))
  return

Pro: straight-forward Contra: painful

Solution 3: Download as archive

import shutil
if os.path.isdir(filename):
  filename = os.path.normpath(filename)
  filename = shutil.make_archive(filename, 'zip', filename)
  download(filename)
  return

Pro: works Contra: none

ArhamInGithub commented 3 years ago

how do I get the file path for os.chdir if the file is in github?