iterative / PyDrive2

Google Drive API Python wrapper library. Maintained fork of PyDrive.
https://docs.iterative.ai/PyDrive2
Other
580 stars 69 forks source link

Implement mv method in GDriveFileSystem #217

Closed simone-viozzi closed 2 years ago

simone-viozzi commented 2 years ago

Hi, i'm addind a method to rename a GoogleDriveFile

the google api to rename a file is: https://developers.google.com/drive/api/v2/reference/files/patch

I saw you have a wrapper for files().patch: _FilesPatch do you prefer if I use the existing wrapper or the api directly?

shcheklein commented 2 years ago

@simone-viozzi this is from the docs:

Manipulate file metadata and contents from [GoogleDriveFile](https://docs.iterative.ai/PyDrive2/pydrive2/#pydrive2.files.GoogleDriveFile) object and call [Upload()](https://docs.iterative.ai/PyDrive2/pydrive2/#pydrive2.files.GoogleDriveFile.Upload)

Does it make sense to use the Upload for this?

E.g. like this https://docs.iterative.ai/PyDrive2/filemanagement/?highlight=metadata#update-file-metadata

simone-viozzi commented 2 years ago

Right now is possible to rename a file using existing methods:

file.FetchMetadata(fields="title")
file["title"] = "new_file1"
file.Upload()

and looking at the code for upload: https://github.com/iterative/PyDrive2/blob/64a50a1580b2d276f8c9563f5ba620f7e7706119/pydrive2/files.py#L499-L512

it uses _FilesPatch internally.

i discovered this way to rename a file with https://github.com/iterative/PyDrive2/issues/91

but it would be much more convenient and user friendly to just have a method that does that

shcheklein commented 2 years ago

@simone-viozzi could you check if fsspec interface had move / rename please? May be we should implement it there (keep this one as-is - low level).

one more question:

file.FetchMetadata(fields="title")
file["title"] = "new_file1"
file.Upload()

is it possible to do this w/o a call to Fetch? (if I know the file id). I wonder also if it's possible to do this if I know only a file name w/o any extra calls. That would be a good benefit of this I think.

simone-viozzi commented 2 years ago

is it possible to do this w/o a call to Fetch? (if I know the file id). I wonder also if it's possible to do this if I know only a file name w/o any extra calls. That would be a good benefit of this I think.

No, without fetch it doesn't do nothing.

Consider the minimal example:

file = drive.CreateFile({'id': "13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ"})
file["title"] = "new_file1"
file.Upload()

The stack-trace of Upload in this case is:

https://github.com/iterative/PyDrive2/blob/64a50a1580b2d276f8c9563f5ba620f7e7706119/pydrive2/apiattr.py#L92-L103

i think because the title was None and it doesn't count as a change, but i might be wrong on this one. This is also the same issue as https://github.com/iterative/PyDrive2/issues/91

With the rename method i implemented:

https://github.com/simone-viozzi/PyDrive2/blob/c55271427ace18d59019f03392e73cf99d46f9eb/pydrive2/files.py#L574-L587

you don't need to fetch it before because i specified it in the request: param["fields"] = "title"

@simone-viozzi could you check if fsspec interface had move / rename please? May be we should implement it there (keep this one as-is - low level).

there should be a move method (https://github.com/iterative/PyDrive2/pull/119) but i can't find it on the current version. Also there is a reference to it in one of the tests:

https://github.com/iterative/PyDrive2/blob/64a50a1580b2d276f8c9563f5ba620f7e7706119/pydrive2/test/test_fs.py#L90-L102

i suppose it's an old test and the method was dropped somewhere.

Anyway, there is no rename / move on the current version of fsspec

shcheklein commented 2 years ago

i discovered this way to rename a file with https://github.com/iterative/PyDrive2/issues/91

yep, thanks. I remember it now. I was trying to fix it, but it was not that trivial. Ideally, it should be fixed though.

there should be a move method (https://github.com/iterative/PyDrive2/pull/119) but i can't find it on the current version. Also there is a reference to it in one of the tests:

https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.rename

That's the default implementation of move:

https://github.com/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L917-L920

That's the rename:

https://github.com/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L1251

It's not terrible (if we optimize copy). But both can improved to use a single API call I think.


So, ideally, we can take a look and see if we can fix the #91 + optimize fsspec to provide a better interface for this. Then we can see if we need a separate wrapper on top of _Patch

Let me know wdyt.

simone-viozzi commented 2 years ago

oh, i get it now, thank you!

i think we can definitively optimize them, maybe for the rename we could use self.auth.service.files().patch(**param).execute() directly, so we don't need the fetch before the call. But this way we would need a function on the GoogleDriveFile level and another on the GDriveFileSystem level.

otherwise we can do just a function on the GDriveFileSystem level like this:

def rename(path, new_title)
    id = _get_item_id(path)
    file = drive.CreateFile({'id': id})
    file.FetchMetadata(fields="title")
    file["title"] = new_title
    file.Upload()

But the current implementation of rename is: def rename(self, path1, path2, **kwargs): and i don't think we can change the signature.

So maybe the best is to change the implementation of move in a way that:

This way we can leave the default implementation of rename.

The move method will become something like:

def move(self, path1, path2):
    path1_left = path1.split('/')[:-1]
    path1_right = path1.split('/')[-1]

    path2_left = path1.split('/')[:-1]
    path2_right = path1.split('/')[-1]

   if path1_left == path2_left:
        rename file from path1_right to path2_right
   else:
        move file from path1 to path2

what do you think?

This way even if someone use move to rename a file it would call the right API

shcheklein commented 2 years ago

Re moving files, check this https://developers.google.com/drive/api/guides/folder#move_files_between_folders

I feel it can be generalized to a single Patch call that updates parents + title if needed, wdyt? The only thing we'll need is to get ids for those parents, for the file, etc. It should be done carefully. The most annoying thing on Google Drive is that one path can resolve into multiple IDs.

simone-viozzi commented 2 years ago

The most annoying thing on Google Drive is that one path can resolve into multiple IDs.

check out: https://rclone.org/commands/rclone_dedupe/

i think for the move / rename method we can just throw an error if a path resolve in multiple IDs.

Re moving files, check this https://developers.google.com/drive/api/guides/folder#move_files_between_folders

Good! so we need patch to rename the file and update to move it. we can use GoogleDriveFile.Upload that does both, i will try it and get back

simone-viozzi commented 2 years ago

mh, funny. The rename part does work:

fs.mv('root/tmp/a.pdf', 'root/tmp/b.pdf')

but the move part does not.

Even if i change the parent array, the Upload method will call _FilesPatch instead of _FilesUpdate.

https://github.com/iterative/PyDrive2/blob/64a50a1580b2d276f8c9563f5ba620f7e7706119/pydrive2/files.py#L499-L512

I also tried on manually calling the privates methods:

if file1_name != file2_name:
    file1._FilesPatch()

if file1_parent != file2_parent:
    file1._FilesUpdate()

But param["body"] = self.GetChanges() in _FilesUpdate does not recognize that the parent array has changed and nothing get updated.

@shcheklein do you have any idea why?

if i did the fetch before it should get that the array has changed

simone-viozzi commented 2 years ago

Ok, now it does work.

both rename / move and rename and move at the same time.

i changed:

file1["parents"].append({"id": file2_parent_id})

to:

file1["parents"] = [{"id": file2_parent_id}]

This way when _FilesUpdate get the changes in the dict, the list is different ( different object ) and it gets updated.

i don't know if it's ok to just reset the parents array, on a normal drive it's not allowed to have a folder with multiple parents but in some other drives maybe is possible.

simone-viozzi commented 2 years ago

I also discovered that we don't need to use _FilesPatch, but _FilesUpdate can also rename the file.

Please take a look at the code, for a single file it should be ready for a review.

Than i will see how to implement recursive and maxdepth.

shcheklein commented 2 years ago

@simone-viozzi thanks! it's a good iteration, I put some comments to improve it.

i don't know if it's ok to just reset the parents array, on a normal drive it's not allowed to have a folder with multiple parents but in some other drives maybe is possible.

can you give more context please?

simone-viozzi commented 2 years ago

i don't know if it's ok to just reset the parents array, on a normal drive it's not allowed to have a folder with multiple parents but in some other drives maybe is possible.

can you give more context please?

if we print the metadata of a file:

file = univpm1.drive.CreateFile({'id': "13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ"})
file.FetchMetadata()
print(json.dumps(file, indent=4))
{
    "id": "13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ",
    "kind": "drive#file",
    "userPermission": {
        "id": "me",
        "type": "user",
        "role": "owner",
        "kind": "drive#permission",
        "selfLink": "https://www.googleapis.com/drive/v2/files/13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ/permissions/me",
        "etag": "\"_QYtqCZlWmDSoT348qPSAnswYHU\"",
        "pendingOwner": false
    },
    "selfLink": "https://www.googleapis.com/drive/v2/files/13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ",
    "ownerNames": [
        "user1"
    ],
    "lastModifyingUserName": "user1",
    "editable": true,
    "writersCanShare": true,
    "mimeType": "application/vnd.google-apps.document",
    "exportLinks": {
        "application/rtf": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=rtf",
        "application/vnd.oasis.opendocument.text": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=odt",
        "text/html": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=html",
        "application/pdf": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=pdf",
        "application/epub+zip": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=epub",
        "application/zip": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=zip",
        "application/vnd.openxmlformats-officedocument.wordprocessingml.document": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=docx",
        "text/plain": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=txt"
    },
    "parents": [
        {
            "selfLink": "https://www.googleapis.com/drive/v2/files/13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ/parents/1IETDYYj23PgGpInZofa9MyANyBlOoiyh",
            "id": "1IETDYYj23PgGpInZofa9MyANyBlOoiyh",
            "isRoot": false,
            "kind": "drive#parentReference",
            "parentLink": "https://www.googleapis.com/drive/v2/files/1IETDYYj23PgGpInZofa9MyANyBlOoiyh"
        }
    ],
    "thumbnailLink": "https://docs.google.com/feeds/vt?gd=true&id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&v=2&s=AMedNnoAAAAAYuhkOb34StnrPTCnQ3wcBSVvE9alyRKz&sz=s220",
    "appDataContents": false,
    "iconLink": "https://drive-thirdparty.googleusercontent.com/16/type/application/vnd.google-apps.document",
    "shared": false,
    "lastModifyingUser": {
        "displayName": "user1",
        "kind": "drive#user",
        "isAuthenticatedUser": true,
        "permissionId": "16663488747090924602",
        "emailAddress": "user1@gmail.com",
        "picture": {
            "url": "https://lh3.googleusercontent.com/a/default-user=s64"
        }
    },
    "owners": [
        {
            "displayName": "user1",
            "kind": "drive#user",
            "isAuthenticatedUser": true,
            "permissionId": "16663488747090924602",
            "emailAddress": "user1@gmail.com",
            "picture": {
                "url": "https://lh3.googleusercontent.com/a/default-user=s64"
            }
        }
    ],
    "copyable": true,
    "etag": "\"MTY1ODg2MzYzNDQ2Nw\"",
    "alternateLink": "https://docs.google.com/document/d/13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ/edit?usp=drivesdk",
    "embedLink": "https://docs.google.com/document/d/13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ/preview?ouid=107468266404355780381",
    "fileSize": "1024",
    "copyRequiresWriterPermission": false,
    "spaces": [
        "drive"
    ],
    "title": "file1",
    "labels": {
        "viewed": true,
        "restricted": false,
        "starred": false,
        "hidden": false,
        "trashed": false
    },
    "explicitlyTrashed": false,
    "createdDate": "2022-07-26T19:13:25.265Z",
    "modifiedDate": "2022-07-26T19:27:14.467Z",
    "modifiedByMeDate": "2022-07-26T19:27:14.467Z",
    "lastViewedByMeDate": "2022-07-26T19:27:14.467Z",
    "markedViewedByMeDate": "1970-01-01T00:00:00.000Z",
    "quotaBytesUsed": "1024",
    "version": "13",
    "capabilities": {
        "canEdit": true,
        "canCopy": true
    }
}

We can see that parents is an array. So by design it should be possible for a file to have multiple parents.

But as we saw in https://github.com/iterative/PyDrive2/pull/188#discussion_r921783275, at least in my own drive, is not possible to have a file with multiple parents.

But if in other types of drives is possible for a file to have multiple parents, the line file1["parents"] = [{"id": file2_parent_id}] will be a big bug.

shcheklein commented 2 years ago

@simone-viozzi multiple parents are not possible since 2020 I think https://developers.google.com/drive/api/guides/ref-single-parent#:~:text=Beginning%20Sept.,exactly%20one%20parent%20folder%20location.

simone-viozzi commented 2 years ago

The way it works right now you always need to specify the target name. @shcheklein is this the same behavior as before?

Also the recursive part does not need almost any implementation, because on gdrive if you move a folder the content is moved automatically.

The max-depth part will be difficult though. Because we need to implement the recursive logic.

simone-viozzi commented 2 years ago

@shcheklein how you want to handle recursive and maxdepth? Right now is recursive by default, I think we can add a check so if the source is a folder and recursive=False it raises an exception. What do you think?

Regarding maxdepth I think is ok to leave it not implemented.

simone-viozzi commented 2 years ago

Hey @shcheklein could you please review this PR? I think it's ready to be merged

simone-viozzi commented 2 years ago

@shcheklein there is already a test: https://github.com/iterative/PyDrive2/blob/688459b9c70c2e398b6dcf6178217f80ef76ee78/pydrive2/test/test_fs.py#L90

shcheklein commented 2 years ago

@shcheklein there is already a test:

https://github.com/iterative/PyDrive2/blob/688459b9c70c2e398b6dcf6178217f80ef76ee78/pydrive2/test/test_fs.py#L90

Q: How did it work before if mv was not implemented?

simone-viozzi commented 2 years ago

@shcheklein regarding the recursive behavior, there is no actual need of recursion. If we change the parent of a folder, all the subfolders will still point to the same id. So they will move along with their parent.

So it is possible to move an entire folder structure with just one call.

simone-viozzi commented 2 years ago

@shcheklein there is already a test: https://github.com/iterative/PyDrive2/blob/688459b9c70c2e398b6dcf6178217f80ef76ee78/pydrive2/test/test_fs.py#L90

Q: How did it work before if mv was not implemented?

Before mv was implemented with copy + rm: https://github.com/fsspec/filesystem_spec/blob/2633445fc54797c79a7ac96d213bd24dfbdfcdc2/fsspec/spec.py#L917-L920

shcheklein commented 2 years ago

@shcheklein regarding the recursive behavior, there is no actual need of recursion.

okay, let's remove / ignore it for now

simone-viozzi commented 2 years ago

I removed the recursive flag. If a user use it, it will go into the kwargs and it will be ignored.

shcheklein commented 2 years ago

Thanks @simone-viozzi !