Closed simone-viozzi closed 2 years ago
@simone-viozzi this is from the docs:
Manipulate file metadata and contents from [GoogleDriveFile](https://docs.iterative.ai/PyDrive2/pydrive2/#pydrive2.files.GoogleDriveFile) object and call [Upload()](https://docs.iterative.ai/PyDrive2/pydrive2/#pydrive2.files.GoogleDriveFile.Upload)
Does it make sense to use the Upload for this?
E.g. like this https://docs.iterative.ai/PyDrive2/filemanagement/?highlight=metadata#update-file-metadata
Right now is possible to rename a file using existing methods:
file.FetchMetadata(fields="title")
file["title"] = "new_file1"
file.Upload()
and looking at the code for upload
:
https://github.com/iterative/PyDrive2/blob/64a50a1580b2d276f8c9563f5ba620f7e7706119/pydrive2/files.py#L499-L512
it uses _FilesPatch
internally.
i discovered this way to rename a file with https://github.com/iterative/PyDrive2/issues/91
but it would be much more convenient and user friendly to just have a method that does that
@simone-viozzi could you check if fsspec interface had move
/ rename
please? May be we should implement it there (keep this one as-is - low level).
one more question:
file.FetchMetadata(fields="title")
file["title"] = "new_file1"
file.Upload()
is it possible to do this w/o a call to Fetch? (if I know the file id). I wonder also if it's possible to do this if I know only a file name w/o any extra calls. That would be a good benefit of this I think.
is it possible to do this w/o a call to Fetch? (if I know the file id). I wonder also if it's possible to do this if I know only a file name w/o any extra calls. That would be a good benefit of this I think.
No, without fetch it doesn't do nothing.
Consider the minimal example:
file = drive.CreateFile({'id': "13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ"})
file["title"] = "new_file1"
file.Upload()
The stack-trace of Upload
in this case is:
Upload
_FilesPatch
GetChanges
-> this return a blank dicti think because the title was None
and it doesn't count as a change, but i might be wrong on this one.
This is also the same issue as https://github.com/iterative/PyDrive2/issues/91
With the rename method i implemented:
you don't need to fetch it before because i specified it in the request: param["fields"] = "title"
@simone-viozzi could you check if fsspec interface had move / rename please? May be we should implement it there (keep this one as-is - low level).
there should be a move method (https://github.com/iterative/PyDrive2/pull/119) but i can't find it on the current version. Also there is a reference to it in one of the tests:
i suppose it's an old test and the method was dropped somewhere.
Anyway, there is no rename
/ move
on the current version of fsspec
i discovered this way to rename a file with https://github.com/iterative/PyDrive2/issues/91
yep, thanks. I remember it now. I was trying to fix it, but it was not that trivial. Ideally, it should be fixed though.
there should be a move method (https://github.com/iterative/PyDrive2/pull/119) but i can't find it on the current version. Also there is a reference to it in one of the tests:
https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.rename
That's the default implementation of move
:
https://github.com/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L917-L920
That's the rename
:
https://github.com/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L1251
It's not terrible (if we optimize copy
). But both can improved to use a single API call I think.
So, ideally, we can take a look and see if we can fix the #91 + optimize fsspec to provide a better interface for this. Then we can see if we need a separate wrapper on top of _Patch
Let me know wdyt.
oh, i get it now, thank you!
i think we can definitively optimize them, maybe for the rename we could use self.auth.service.files().patch(**param).execute()
directly, so we don't need the fetch before the call.
But this way we would need a function on the GoogleDriveFile
level and another on the GDriveFileSystem
level.
otherwise we can do just a function on the GDriveFileSystem
level like this:
def rename(path, new_title)
id = _get_item_id(path)
file = drive.CreateFile({'id': id})
file.FetchMetadata(fields="title")
file["title"] = new_title
file.Upload()
But the current implementation of rename is: def rename(self, path1, path2, **kwargs):
and i don't think we can change the signature.
So maybe the best is to change the implementation of move in a way that:
This way we can leave the default implementation of rename.
The move method will become something like:
def move(self, path1, path2):
path1_left = path1.split('/')[:-1]
path1_right = path1.split('/')[-1]
path2_left = path1.split('/')[:-1]
path2_right = path1.split('/')[-1]
if path1_left == path2_left:
rename file from path1_right to path2_right
else:
move file from path1 to path2
what do you think?
This way even if someone use move to rename a file it would call the right API
Re moving files, check this https://developers.google.com/drive/api/guides/folder#move_files_between_folders
I feel it can be generalized to a single Patch call that updates parents + title if needed, wdyt? The only thing we'll need is to get ids for those parents, for the file, etc. It should be done carefully. The most annoying thing on Google Drive is that one path can resolve into multiple IDs.
The most annoying thing on Google Drive is that one path can resolve into multiple IDs.
check out: https://rclone.org/commands/rclone_dedupe/
i think for the move / rename method we can just throw an error if a path resolve in multiple IDs.
Re moving files, check this https://developers.google.com/drive/api/guides/folder#move_files_between_folders
Good!
so we need patch
to rename the file and update
to move it.
we can use GoogleDriveFile.Upload
that does both, i will try it and get back
mh, funny. The rename part does work:
fs.mv('root/tmp/a.pdf', 'root/tmp/b.pdf')
but the move part does not.
Even if i change the parent array, the Upload
method will call _FilesPatch
instead of _FilesUpdate
.
I also tried on manually calling the privates methods:
if file1_name != file2_name:
file1._FilesPatch()
if file1_parent != file2_parent:
file1._FilesUpdate()
But param["body"] = self.GetChanges()
in _FilesUpdate
does not recognize that the parent array has changed and nothing get updated.
@shcheklein do you have any idea why?
if i did the fetch before it should get that the array has changed
Ok, now it does work.
both rename / move and rename and move at the same time.
i changed:
file1["parents"].append({"id": file2_parent_id})
to:
file1["parents"] = [{"id": file2_parent_id}]
This way when _FilesUpdate
get the changes in the dict, the list is different ( different object ) and it gets updated.
i don't know if it's ok to just reset the parents array, on a normal drive it's not allowed to have a folder with multiple parents but in some other drives maybe is possible.
I also discovered that we don't need to use _FilesPatch
, but _FilesUpdate
can also rename the file.
Please take a look at the code, for a single file it should be ready for a review.
Than i will see how to implement recursive
and maxdepth
.
@simone-viozzi thanks! it's a good iteration, I put some comments to improve it.
i don't know if it's ok to just reset the parents array, on a normal drive it's not allowed to have a folder with multiple parents but in some other drives maybe is possible.
can you give more context please?
i don't know if it's ok to just reset the parents array, on a normal drive it's not allowed to have a folder with multiple parents but in some other drives maybe is possible.
can you give more context please?
if we print the metadata of a file:
file = univpm1.drive.CreateFile({'id': "13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ"})
file.FetchMetadata()
print(json.dumps(file, indent=4))
{
"id": "13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ",
"kind": "drive#file",
"userPermission": {
"id": "me",
"type": "user",
"role": "owner",
"kind": "drive#permission",
"selfLink": "https://www.googleapis.com/drive/v2/files/13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ/permissions/me",
"etag": "\"_QYtqCZlWmDSoT348qPSAnswYHU\"",
"pendingOwner": false
},
"selfLink": "https://www.googleapis.com/drive/v2/files/13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ",
"ownerNames": [
"user1"
],
"lastModifyingUserName": "user1",
"editable": true,
"writersCanShare": true,
"mimeType": "application/vnd.google-apps.document",
"exportLinks": {
"application/rtf": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=rtf",
"application/vnd.oasis.opendocument.text": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=odt",
"text/html": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=html",
"application/pdf": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=pdf",
"application/epub+zip": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=epub",
"application/zip": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=zip",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=docx",
"text/plain": "https://docs.google.com/feeds/download/documents/export/Export?id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&exportFormat=txt"
},
"parents": [
{
"selfLink": "https://www.googleapis.com/drive/v2/files/13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ/parents/1IETDYYj23PgGpInZofa9MyANyBlOoiyh",
"id": "1IETDYYj23PgGpInZofa9MyANyBlOoiyh",
"isRoot": false,
"kind": "drive#parentReference",
"parentLink": "https://www.googleapis.com/drive/v2/files/1IETDYYj23PgGpInZofa9MyANyBlOoiyh"
}
],
"thumbnailLink": "https://docs.google.com/feeds/vt?gd=true&id=13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ&v=2&s=AMedNnoAAAAAYuhkOb34StnrPTCnQ3wcBSVvE9alyRKz&sz=s220",
"appDataContents": false,
"iconLink": "https://drive-thirdparty.googleusercontent.com/16/type/application/vnd.google-apps.document",
"shared": false,
"lastModifyingUser": {
"displayName": "user1",
"kind": "drive#user",
"isAuthenticatedUser": true,
"permissionId": "16663488747090924602",
"emailAddress": "user1@gmail.com",
"picture": {
"url": "https://lh3.googleusercontent.com/a/default-user=s64"
}
},
"owners": [
{
"displayName": "user1",
"kind": "drive#user",
"isAuthenticatedUser": true,
"permissionId": "16663488747090924602",
"emailAddress": "user1@gmail.com",
"picture": {
"url": "https://lh3.googleusercontent.com/a/default-user=s64"
}
}
],
"copyable": true,
"etag": "\"MTY1ODg2MzYzNDQ2Nw\"",
"alternateLink": "https://docs.google.com/document/d/13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ/edit?usp=drivesdk",
"embedLink": "https://docs.google.com/document/d/13nYjSBX7FmSUFnBcTuT6hKsFswXhqcpzpioei5r2eRQ/preview?ouid=107468266404355780381",
"fileSize": "1024",
"copyRequiresWriterPermission": false,
"spaces": [
"drive"
],
"title": "file1",
"labels": {
"viewed": true,
"restricted": false,
"starred": false,
"hidden": false,
"trashed": false
},
"explicitlyTrashed": false,
"createdDate": "2022-07-26T19:13:25.265Z",
"modifiedDate": "2022-07-26T19:27:14.467Z",
"modifiedByMeDate": "2022-07-26T19:27:14.467Z",
"lastViewedByMeDate": "2022-07-26T19:27:14.467Z",
"markedViewedByMeDate": "1970-01-01T00:00:00.000Z",
"quotaBytesUsed": "1024",
"version": "13",
"capabilities": {
"canEdit": true,
"canCopy": true
}
}
We can see that parents
is an array. So by design it should be possible for a file to have multiple parents.
But as we saw in https://github.com/iterative/PyDrive2/pull/188#discussion_r921783275, at least in my own drive, is not possible to have a file with multiple parents.
But if in other types of drives is possible for a file to have multiple parents, the line file1["parents"] = [{"id": file2_parent_id}]
will be a big bug.
@simone-viozzi multiple parents are not possible since 2020 I think https://developers.google.com/drive/api/guides/ref-single-parent#:~:text=Beginning%20Sept.,exactly%20one%20parent%20folder%20location.
The way it works right now you always need to specify the target name. @shcheklein is this the same behavior as before?
Also the recursive part does not need almost any implementation, because on gdrive if you move a folder the content is moved automatically.
The max-depth part will be difficult though. Because we need to implement the recursive logic.
@shcheklein how you want to handle recursive
and maxdepth
?
Right now is recursive by default, I think we can add a check so if the source is a folder and recursive=False
it raises an exception. What do you think?
Regarding maxdepth
I think is ok to leave it not implemented.
Hey @shcheklein could you please review this PR? I think it's ready to be merged
@shcheklein there is already a test: https://github.com/iterative/PyDrive2/blob/688459b9c70c2e398b6dcf6178217f80ef76ee78/pydrive2/test/test_fs.py#L90
@shcheklein there is already a test:
Q: How did it work before if mv was not implemented?
@shcheklein regarding the recursive behavior, there is no actual need of recursion. If we change the parent of a folder, all the subfolders will still point to the same id. So they will move along with their parent.
So it is possible to move an entire folder structure with just one call.
@shcheklein there is already a test: https://github.com/iterative/PyDrive2/blob/688459b9c70c2e398b6dcf6178217f80ef76ee78/pydrive2/test/test_fs.py#L90
Q: How did it work before if mv was not implemented?
Before mv
was implemented with copy
+ rm
:
https://github.com/fsspec/filesystem_spec/blob/2633445fc54797c79a7ac96d213bd24dfbdfcdc2/fsspec/spec.py#L917-L920
@shcheklein regarding the recursive behavior, there is no actual need of recursion.
okay, let's remove / ignore it for now
I removed the recursive
flag. If a user use it, it will go into the kwargs
and it will be ignored.
Thanks @simone-viozzi !
Hi, i'm addind a method to rename a
GoogleDriveFile
the google api to rename a file is: https://developers.google.com/drive/api/v2/reference/files/patch
I saw you have a wrapper for
files().patch
:_FilesPatch
do you prefer if I use the existing wrapper or the api directly?