rclone / rclone

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files
https://rclone.org
MIT License
45.98k stars 4.11k forks source link

Google Drive: No way to delete orphan files #4166

Open Zeioth opened 4 years ago

Zeioth commented 4 years ago

On Google drive, I deleted a directory that was created by rclone, and now all the 40.000 files inside of it are orphan. There's some way to use RClone to delete all orphan files on my drive? Thank you.

What is your rclone version (output from rclone version)

Latest (AUR) -> rclone v1.51.0

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Manjaro

Which cloud storage system are you using? (eg Google Drive)

Google drive

Zeioth commented 4 years ago

Possible solution: https://medium.com/@staticmukesh/how-i-managed-to-delete-1-billions-files-from-my-google-drive-account-9fdc67e6aaca

Animosity022 commented 4 years ago

@ncw - I know we've looked before and I still do not see any API way to do this rather than that longer complex way listed a few posts online like the one above. I can't see this coming into rclone unless they expose the API so I can mark an as an enhancement for Drive and let me know what you want to do with it.

Zeioth commented 4 years ago

For some stupid reason, I need to validate my OAuth credentials in order to delete files from Google Drive, which apparently takes 4-6 weeks, so I can't confirm that everything works yet, but this should delete only orphan files and directories:

The premise is:

from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive']

def callback(request_id, response, exception):
    if exception:
        print("Exception:", exception)

def main():
    """
   Description:
   Shows basic usage of the Drive v3 API to delete orphan files.
   """

    """ --- CHECK CREDENTIALS --- """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    """ --- OPEN CONNECTION --- """
    service = build('drive', 'v3', credentials=creds)

    page_token = ""
    files = None
    orphans = []
    page_size = 100
    batch_counter = 0

    print("LISTING ORPHAN FILES")
    print("-----------------------------")
    while (True):
        # List
        r = service.files().list(pageToken=page_token,
                                 pageSize=page_size,
                                 fields="nextPageToken, files"
                                 ).execute()
        page_token = r.get('nextPageToken')
        files = r.get('files', [])

        # Filter orphans
        # NOTE: (If the file has no 'parents' field, it means it's orphan)
        for file in files:
            try:
                if file['parents']:
                    print("File with a parent found.")
            except Exception as e:
                print("Orphan file found.")
                orphans.append(file['id'])

        # Exit condition
        if page_token is None:
            break

    print("DELETING ORPHAN FILES")
    print("-----------------------------")
    batch_size = min(len(orphans), 100)
    while(len(orphans) > 0):
        batch = service.new_batch_http_request(callback=callback)
        for i in range(batch_size):
            print("File with id {0} queued for deletion.".format(orphans[0]))
            batch.add(service.files().delete(fileId=orphans[0]))
            del orphans[0]
        batch.execute()
        batch_counter += 1
        print("BATCH {0} DELETED - {1} FILES DELETED".format(batch_counter,
                                                             batch_size))

if __name__ == '__main__':
    main()

This should not delete files in the root directory, as they have the 'root' value for the field 'parents'.

ncw commented 4 years ago

This appears to be quite tricky...

To make it work in rclone we'd need to find a query which meant this file has no parents - I'm not sure there is one though...

Assuming not, then you'd have to do the initial query with no ... in parents filter which would mean that you'd get all items returned. You'd then manually discard any item with parents.

That would work, but it would be very slow for drives with lots of files.

If anyone wants to have a go at a --drive-orphans flag, here is the code you need to be looking at (showing the trashed only function)

https://github.com/rclone/rclone/blob/8e91f83174cc4011158c596d0c8569cd226fc36d/backend/drive/drive.go#L679-L681

ncw commented 4 years ago

On Google drive, I deleted a directory that was created by rclone, and now all the 40.000 files inside of it are orphan. There's some way to use RClone to delete all orphan files on my drive?

How did you delete that directory? Was it through rclone? If so we need to fix that!

Zeioth commented 4 years ago

No it wasn't through RClone: I deleted the directory created by RClone, manually on Google Drive website. Apparently when you do that, it generates orphan files. It's on Google Drive documentation.

It's their fault, not RClone. Still it's a quite common case of use.

ncw commented 4 years ago

No it wasn't through RClone: I deleted the directory created by RClone, manually on Google Drive website. Apparently when you do that, it generates orphan files. It's on Google Drive documentation.

Really! That is weird!

ncw commented 4 years ago

I've added the framework for backend commands now, so if someone wanted to make a "rescue-orphans" command for the drive backend that would now be possible.

Ret2lib commented 3 years ago

Hi,

I'm willing to help contribute to this but not sure what is meant by "framework for backend commands". Is there somewhere I can read about on this? What part of the code do I need to add this to (I just found out about rclone today)

ncw commented 3 years ago

@Ret2lib sorry for the delay responding.

The backend commands are defined here

https://github.com/rclone/rclone/blob/654f5309b041cbb65bba3b86423fcb75c0d84dab/backend/drive/drive.go#L3077

And implemented here

https://github.com/rclone/rclone/blob/654f5309b041cbb65bba3b86423fcb75c0d84dab/backend/drive/drive.go#L3211

So you'd make a function which did the work and parse the arguments in the Command function, then call your function.

You'd use it something like this

rclone backend rescueorphans drive:dir-to-rescue-files

Let me know if you'd like more help and I'll provide it :-)

dzg commented 1 month ago

Did anything ever get solved here? I have millions of orphaned files.

ncw commented 1 month ago

@dzg I had a go at implementing this

v1.68.0-beta.8105.96f964bbb.fix-4166-drive-rescue-orphans on branch fix-4166-drive-rescue-orphans (uploaded in 15-30 mins)

Use like this

rescue

Rescue or delete any orphaned files

rclone backend rescue remote: [options] [<arguments>+]

This command rescues or deletes any orphaned files or directories.

Sometimes files can get orphaned in Google Drive. This means that they are no longer in any folder in Google Drive.

This command finds those files and either rescues them to a directory you specify or deletes them.

Usage:

This can be used in 3 ways.

First, list all orphaned files

rclone backend rescue drive:directory

Second rescue all orphaned files to the directory indicated

rclone backend rescue drive:directory "relative/path/to/rescue/directory"

Third delete all orphaned files to the trash

rclone backend rescue drive:directory -o delete

HOWEVER this does not work for me. It finds the rescued files just fine, but trying to delete them gives 403 permission denied, and trying to rescue them into a directory (add a parent) gives "Increasing the number of parents is not allowed, cannotAddParent". I tried lots of variations of adding/removing parents but I couldn't get anything to work.

So I think that these files really are orphaned. The API no longer lets us add a parent to rescue them, and the API doesn't seem to let us delete them either.

It would be worth having a try with it though to see if you have the same experience.