Closed franticspider closed 7 years ago
No, not a known feature π, so something is not right. Possibilities:
Is this Team Drive? We're still working on that. #101
Are you authenticating with the same Google identity with which the folder is shared? drive_user()
will revel who googledrive thinks you are.
Do you happen to know your exact status wrt to this folder? Did the owner grant you "Can organize, add, & edit" or "Can view only" status? And is there a chance their intent and actions do not match?
How does this folder and the files within show up for you in terms of "files shared with me"? In the browser, instructions are here: https://support.google.com/drive/answer/2375057. Via googledrive, this should reveal such file: drive_find(q = "sharedWithMe")
.
ok, can't tel you exactly what my status was, but I looked in "files shared with me", and then added the folder to my drive - now when I do drive_find(q = "fullText contains 'USPO'"), the files I'm looking for are listed. Awesome! Many thanks
Interesting and glad it worked!
I wouldn't have expected it to be necessary to add the folder to your Drive. But we are still in the very early days here, so learning lots by hearing how things go in the real world. I'm leaving this open so we can do some experiments and get a better understanding.
I was following this example in the Help documentation
drive_ls(path = "abc", pattern = "def", type = "spreadsheet")
But got the error:
* domain: global
* reason: internalError
* message: Internal Error
If I do drive_find(q = "fullText contains 'my_keyword'")
then I does list the files I am looking for.
But I only want spreadsheets and the drive_find lists everything that it finds. Fetching files from a specific folder is what I am looking for.
I also tried adding ~/
to the path but no luck.
Got away with little more effort using drive_find
for now :smiley:
drive_find(q = "fullText contains 'my_keyword_'", pattern = "keyword_", type = "spreadsheet")
What was the exact drive_ls()
call that you used? Do you want to filter for my_keyword_
in the name of the file or anywhere in its full text or both? What is the name of the folder you want to search inside?
Something like this should work to:
abc
as direct parentblah
anywhere in the full textdrive_ls("abc", type = "spreadsheet", q = "fullText contains 'blah'")
BTW the error you saw may be an intermittent one on the server side. I assume the number is 500 or above?
This is what I tried
x <- drive_ls(path = "~/folder1/folder2",pattern = "filename_keyword", type = "spreadsheet")
I was trying to load all the files in a folder I know.
Forgot to paste the full error, yes it was 500, Error: HTTP error [500] Internal Error
I am now trying to read the file from the list I have, looks like googledrive
package does not have file read functionality. Do I need to use the same old googlesheets
package for that ?
Thanks for responding so fast, googledrive is a great package and definitely filling that missing gap.
This
x <- drive_ls(path = "~/folder1/folder2",pattern = "filename_keyword", type = "spreadsheet")
Should filter for files in ~/folder1/folder2
with filename_keyword
in their name that are also Google Sheets. If it fails to do that consistently, I would be interested, because it's a bug.
I too have gotten the Error: HTTP error [500] Internal Error
but I'm afraid it is intermittent and server side.
Correct re: reading those sheets into R. Right now you're stuck with current googlesheets for that, but once we get googledrive on CRAN, googlesheets gets a reboot that gets all this file handling + wraps the new Sheets v3 API.
Its take a long time and I have to stop it...it throws this error when stopped:
Error in curl::curl_fetch_memory(url, handle = handle) :
Operation was aborted by an application callback
The drive_find is relatively very fast and loads the output in a few seconds.
π€ This sounds strange. A few things that may help troubleshoot a bit:
Can you run the drive_find()
that is working and use the drive_add_paths()
command to show where the files you are looking for live?
drive_find(q = "fullText contains 'my_keyword_'", pattern = "keyword_", type = "spreadsheet") %>%
drive_add_path()
This will just make sure these files actually live in ~/folder1/folder2
(I'm sure they do, but it just seems like a logical first step to double check)
If you try drive_ls(type = "spreadsheet")
do you have the same problem? Or is it only when you try to search within a specific folder?
It is actually a lot more work whenever a file or folder is identified by path than id. We are placing more API calls in the background. It is usually not very slow, but I have noticed the Drive API has been unusually slow in the past couple of days and throwing more odd errors.
I don't want to recommend against specifying files via path, in general, because we do want to support that. But it is more vulnerable to API slowdown.
This would be the best / fastest workflow for what you want:
target <- drive_get("~/folder1/folder2/")
## or, the absolute fastest:
## visit the directory of interest in the browser and copy the URL
target <- drive_get(as_id("URL-goes-here"))
drive_ls(target, pattern = "filename_keyword", type = "spreadsheet")
@jennybc yess...that worked like charm, both options... thanks a lot for your help.
Hi Jenny! I know it's been a long time since this thread was created but hey, fingers crossed =)
I'm having trouble combining queries in order to combine "in parents" and "fullText contains". What I'm doing now is:
drive_find(q="'1ywgPdg9sm6Bi3Opbc65k4LlAIkscaOIg' in parents", q="fullText contains 'Grandes'")
The folder's ID is correct, but fullText doesn't perform as such, and instead, it gives me a "name" equivalent. In other words, it doesn't show me every document with "Grandes" on its content, but it shows me every file with "Grandes" on its name. Any suggestions? Thank you very much!
According to the docs:
https://developers.google.com/drive/api/v3/ref-search-files
fullText
searches "Full text of the file including name, description, content, and indexable text."
So I think you have to consider the possibility that 'Grandes' only appears in file names and you don't have any files with 'Grandes' in the text but not in the name.
I've intentionally put a document myself with "Grandes" in the text, so sadly it's not the case... I've been running several tests but I have the same result every time.
It works for me:
library(googledrive)
drive_auth(email = "jenny.f.bryan@gmail.com")
folder <- drive_mkdir("q-test")
#> Folder created:
#> * q-test
drive_example("chicken.txt") %>%
readLines()
#> [1] "A chicken whose name was Chantecler"
#> [2] "Clucked in iambic pentameter"
#> [3] "It sat on a shelf, reading Song of Myself"
#> [4] "And laid eggs with a perfect diameter."
#> [5] ""
#> [6] "βRichard Maxson"
drive_example("chicken.txt") %>%
drive_upload(path = folder)
#> Local file:
#> * /Users/jenny/resources/R/library_3.6/googledrive/extdata/chicken.txt
#> uploaded into Drive file:
#> * chicken.txt: 1e3iVhs30ggJVAe-Ecn_ruSjzBCtW3_0_
#> with MIME type:
#> * text/plain
drive_find(q = "fullText contains 'eggs'")
#> # A tibble: 10 x 3
#> name id drive_resource
#> * <chr> <chr> <list>
#> 1 Papa's Pancakes 1K9MAOHIngHOu_8yOsmmGFtJcT⦠<list [30]>
#> 2 manuscript.pdf 109WQdJU8tmIjHkqt-_0tajrbV⦠<list [37]>
#> 3 Community call v9 14DCZDj3OvIZm0NjVs0BWqyxCd⦠<list [30]>
#> 4 2016 Nutrition challenge (Re⦠1M6Avyc8Y1eRJE6wjqa9BunC0p⦠<list [32]>
#> 5 MSc Bioinformatics Defense 1nkMP3H0gISIo2UpbLPJDWlicJ⦠<list [30]>
#> 6 chicken.txt 13bXmweyE8xTOl1PnuepbGUj7Y⦠<list [38]>
#> 7 Copy of CANDY HIERARCHY 2015β¦ 1yKj6W035bfG1I5z3R9xYG5aL7β¦ <list [33]>
#> 8 Copy of CANDY HIERARCHY 2015β¦ 1t5IKEZUX35LzbircxXiicO6o4β¦ <list [33]>
#> 9 Copy of CANDY HIERARCHY 2015⦠1469d7bsvJzRG3HcTAHoY0BlxB⦠<list [33]>
#> 10 CANDY HIERARCHY 2015 SURVEY ⦠1REZvjqv0lj3dEYb0CsGyDXkXr⦠<list [33]>
drive_find(
q = paste0("'", as_id(folder), "' in parents"),
q = "fullText contains 'eggs'"
)
#> # A tibble: 1 x 3
#> name id drive_resource
#> * <chr> <chr> <list>
#> 1 chicken.txt 1e3iVhs30ggJVAe-Ecn_ruSjzBCtW3_0_ <list [39]>
Created on 2019-07-01 by the reprex package (v0.3.0)
Thanks Jenny! I'll use this as an extra guide to see if I can work my way around it.
Hi Jenny! Sorry to bother you again. I've found out the exact problem: it doesn't search for content in folders within folders.
`folder <- 'https://googledrive.fancyurl'
drive_ls(q=paste0("'", as_id(folder), "' in parents"), q="fullText contains 'semestre'", recursive = TRUE)`
The outcome is every file that contains "semestre" located directly inside that folder, but it doesn't return files that contain "semestre" inside another folder. Suppose I have "fancyurl/anothercoolfolder", it won't return any file inside anothercoolfolder. I've tried with recursive TRUE and FALSE and neither changes anything. Any thoughts? Thanks again for everything!
How about using drive_ls(path = YOUR FOLDER, recursive = TRUE, q="fullText contains 'semestre'")
?
Oh sorry, I see you tried that. Maybe open a new issue for this conversation? Otherwise it will get lost.
I think we're learning something about how Google implements multiple q
clauses, such as, perhaps, the order in which they are implemented.
Actually your drive_ls()
call is different from what I suggest. Try using my syntax. Specify the target folder via path
, not via a DIY q
clause re: parent
.
That worked like a charm! Thank you Jenny, you've been super kind.
Hi Jenny!! I am trying to fetch the list of files/folders from a specific location in the drive.
product_folders_list <- drive_ls(path = as_dribble("https://drive.google.com/drive/folders/folderId")) %>%
as.data.frame()
But I am getting an error message.
Can I know how to fix this issue?
I would first do a drive_get()
on this folder to make sure you've got the id right and you've logged in as a user with the right permissions. The error suggests either the file (folder) does not exist or you don't have permission to read it.
Hi Jenny. I get the point you are mentioning. I used oAuth token for authenticating the drive and I am able to access the required folder. But I am unable to achieve this when I tried to authenticate using service token.
I created a JSON file and called it
drive_auth(path = "gdrive-b1000ec72d8a.json")
And when I tried drive_ls
, I'm getting the error message mentioned above. I'm not sure why the drive is authenticated. Can you provide a solution for accessing drive using a service token?
But I am unable to achieve this when I tried to authenticate using service token.
It sounds like the service account does not have the necessary permissions.
Hi, I am trying to access files from a shared google drive with extension .csv.gz, I have multiple files with similar names inside the google drive folder and want to access the latest one and read the same thru R. Can someone help here?
I was trying to view the contents of a sub folder with the following command:
I kept getting this error
Error: Parent specified via
pathis invalid:
Based on reading the comments above, I tried the following and was successful:
I thought I would share here.
Hi!
I try to access a shared google folder. My user has read permissions for sure (I can access it via web browser for instance), but when I try googledrive::drive_ls(googledrive::asid(
Please see this part of the docs for drive_ls()
:
path
Specifies a single folder on Google Drive whose contents you want to list. Can be an actual path (character), a file id or URL marked with as_id(), or a dribble. If it is a shared drive or is a folder on a shared drive, it must be passed as a dribble. If path is a shortcut to a folder, it is automatically resolved to its target folder
(~Although I would expect a marked ID to work as well.~ no, I guess I really mean that it must be a dribble.)
If you want to go any further, I'll need to see actual code that runs, not prose, as the disconnect / misunderstanding is often in the particulars that people leave out.
Hello Jenny and thank you for your prompt response!
So I tried with dribble: folderDribble <- googledrive::as_dribble(googledrive::as_id('...')) this command works and succesfully finds my shared folder
ls <- googledrive::drive_ls(folderDribble) this command fails with the same 404 error, shared drive not found.
One note: this is a shared folder, not a shared drive; not sure if this info helps in any way.
We've definitely gotten to this point. I can't just reason through this from a description.
If you want to go any further, I'll need to see actual code that runs, not prose, as the disconnect / misunderstanding is often in the particulars that people leave out.
I have run into this and I believe this occurs when a user is in a shared drive, but not a member of that shared drive. I created a shared drive with one folder and one spreadsheet. I did not add anyone but myself to the members of the shared drive.
I am on GCP, and I'm using a service account (my use case), so using gargle
to get my token. I added a service account to the folder using Sharing (Content Manager Role) and shared the sheet using Sharing (Editor Role). This was the result (going to post again when adding email as a member of the drive):
library(googledrive)
library(gargle)
#>
#> Attaching package: 'gargle'
#> The following object is masked from 'package:googledrive':
#>
#> request_make
library(googlesheets4)
#>
#> Attaching package: 'googlesheets4'
#> The following object is masked from 'package:gargle':
#>
#> request_make
#> The following objects are masked from 'package:googledrive':
#>
#> request_generate, request_make
options(httr_oauth_cache = FALSE, gargle_oauth_cache = FALSE)
id_sheet = as_id("1FJpa4f2_OKqR0cbwSmWhyn5N3oJ17kZu3HGytGZk4J4")
id_folder = as_id("1AMKXrcB8X4H2V7lf0HqOVS4L54LJd3KB")
id_shared_drive = as_id("0AIpcp-4qOK7CUk9PVA")
token = gargle::credentials_gce(scopes = "https://www.googleapis.com/auth/drive")
googledrive::drive_auth(token = token)
Showing that we can, in fact, access the sheet (directly via download, but later we can show listing too):
df = read_sheet(id_sheet)
#> β Reading from "EXAMPLE_Demo_Main_Sheet".
#> β Range 'provider_basic_info'.
head(df)
#> # A tibble: 6 Γ 8
#> id_provider specialty `Date Start` Date Stβ¦ΒΉ Incluβ¦Β² Notes Decisβ¦Β³ Incluβ¦β΄
#> <chr> <chr> <lgl> <lgl> <chr> <lgl> <lgl> <chr>
#> 1 Dr. Mindt Primary Care NA NA Yes NA NA Yes
#> 2 Dr. Phenicie OB/GYN NA NA Yes NA NA Yes
#> 3 Dr. Cabrera Primary Care NA NA Yes NA NA Yes
#> 4 Dr. Higgins Psychiatry NA NA Yes NA NA Yes
#> 5 Dr. Moin Primary Care NA NA Yes NA NA Yes
#> 6 Dr. Stafford Primary Care NA NA Yes NA NA Yes
#> # β¦ with abbreviated variable names ΒΉβ`Date Stop`, Β²β`Include on Platform`,
#> # Β³βDecision, β΄β`Include on Checks`
dribble_folder = drive_get(id_folder)
print(dribble_folder)
#> # A dribble: 1 Γ 3
#> name id drive_resource
#> <chr> <drv_id> <list>
#> 1 Example_Demo_Folder 1AMKXrcB8X4H2V7lf0HqOVS4L54LJd3KB <named list [31]>
drive_ls
We expect to fail given not an ID given the comment on: https://github.com/tidyverse/googledrive/issues/154#issuecomment-1290766313
try({
drive_ls(id_folder)
})
#> Error in map(as_id(id), get_one_shared_drive_id) : βΉ In index: 1.
#> Caused by error in `gargle::response_process()`:
#> ! Client error: (404) Not Found
#> Shared drive not found: 0AIpcp-4qOK7CUk9PVA
#> β’ message: Shared drive not found: 0AIpcp-4qOK7CUk9PVA
#> β’ domain: global
#> β’ reason: notFound
#> β’ location: driveId
#> β’ locationType: parameter
try({
drive_ls(id_folder, shared_drive = id_shared_drive)
})
#> Error in map(as_id(id), get_one_shared_drive_id) : βΉ In index: 1.
#> Caused by error in `gargle::response_process()`:
#> ! Client error: (404) Not Found
#> Shared drive not found: 0AIpcp-4qOK7CUk9PVA
#> β’ message: Shared drive not found: 0AIpcp-4qOK7CUk9PVA
#> β’ domain: global
#> β’ reason: notFound
#> β’ location: driveId
#> β’ locationType: parameter
We see no issue authorization-wize on drive_get
:
dribble_folder = drive_get(id_folder)
print(dribble_folder)
#> # A dribble: 1 Γ 3
#> name id drive_resource
#> <chr> <drv_id> <list>
#> 1 Example_Demo_Folder 1AMKXrcB8X4H2V7lf0HqOVS4L54LJd3KB <named list [31]>
dr = dribble_folder$drive_resource[[1]]
dr$driveId == as.character(id_shared_drive)
#> [1] TRUE
dr$teamDriveId == as.character(id_shared_drive)
#> [1] TRUE
dr$parents[[1]] == as.character(id_shared_drive)
#> logical(0)
I think this should work given dribble and have access to the sheet, and the folder has been given Content Manager Role:
try({
drive_ls(dribble_folder)
})
#> Error in map(as_id(id), get_one_shared_drive_id) : βΉ In index: 1.
#> Caused by error in `gargle::response_process()`:
#> ! Client error: (404) Not Found
#> Shared drive not found: 0AIpcp-4qOK7CUk9PVA
#> β’ message: Shared drive not found: 0AIpcp-4qOK7CUk9PVA
#> β’ domain: global
#> β’ reason: notFound
#> β’ location: driveId
#> β’ locationType: parameter
try({
drive_ls(dribble_folder, shared_drive = id_shared_drive)
})
#> Error in map(as_id(id), get_one_shared_drive_id) : βΉ In index: 1.
#> Caused by error in `gargle::response_process()`:
#> ! Client error: (404) Not Found
#> Shared drive not found: 0AIpcp-4qOK7CUk9PVA
#> β’ message: Shared drive not found: 0AIpcp-4qOK7CUk9PVA
#> β’ domain: global
#> β’ reason: notFound
#> β’ location: driveId
#> β’ locationType: parameter
try({
dribble_drive = drive_get(id_shared_drive)
})
#> Error in map(as_id(id), get_one_file_id) : βΉ In index: 1.
#> Caused by error in `gargle::response_process()`:
#> ! Client error: (404) Not Found
#> File not found: 0AIpcp-4qOK7CUk9PVA.
#> β’ message: File not found: 0AIpcp-4qOK7CUk9PVA.
#> β’ domain: global
#> β’ reason: notFound
#> β’ location: fileId
#> β’ locationType: parameter
Here we show the parents are the folder ID, just for good measure that this is indeed in the folder I specified above.
dribble_sheet = drive_get(id_sheet)
dr = dribble_sheet$drive_resource[[1]]
dr$driveId == as.character(id_shared_drive)
#> [1] TRUE
dr$teamDriveId == as.character(id_shared_drive)
#> [1] TRUE
dr$parents[[1]] == as.character(id_folder)
#> [1] TRUE
Created on 2023-03-07 with reprex v2.0.2
I added the service account as "Viewer" to the overall Shared Drive (member), and ran the same code as last post, but all works. So I believe one of the underlying issues is that you may need to add users to the drive as viewers to be able to use drive_find
(and therefore drive_ls
)
I can add you to the sheets @jennybc if you want (and then add you to the drive) to replicate. Please email me if you want to chat quickly with a demo quickly if you don't want to replicate the Drive creation.
library(googledrive)
library(gargle)
#>
#> Attaching package: 'gargle'
#> The following object is masked from 'package:googledrive':
#>
#> request_make
library(googlesheets4)
#>
#> Attaching package: 'googlesheets4'
#> The following object is masked from 'package:gargle':
#>
#> request_make
#> The following objects are masked from 'package:googledrive':
#>
#> request_generate, request_make
options(httr_oauth_cache = FALSE, gargle_oauth_cache = FALSE)
id_sheet = as_id("1FJpa4f2_OKqR0cbwSmWhyn5N3oJ17kZu3HGytGZk4J4")
id_folder = as_id("1AMKXrcB8X4H2V7lf0HqOVS4L54LJd3KB")
id_shared_drive = as_id("0AIpcp-4qOK7CUk9PVA")
token = gargle::credentials_gce(scopes = "https://www.googleapis.com/auth/drive")
googledrive::drive_auth(token = token)
Showing that we can, in fact, access the sheet:
df = read_sheet(id_sheet)
#> β Reading from "EXAMPLE_Demo_Main_Sheet".
#> β Range 'provider_basic_info'.
head(df)
#> # A tibble: 6 Γ 8
#> id_provider specialty `Date Start` Date Stβ¦ΒΉ Incluβ¦Β² Notes Decisβ¦Β³ Incluβ¦β΄
#> <chr> <chr> <lgl> <lgl> <chr> <lgl> <lgl> <chr>
#> 1 Dr. Mindt Primary Care NA NA Yes NA NA Yes
#> 2 Dr. Phenicie OB/GYN NA NA Yes NA NA Yes
#> 3 Dr. Cabrera Primary Care NA NA Yes NA NA Yes
#> 4 Dr. Higgins Psychiatry NA NA Yes NA NA Yes
#> 5 Dr. Moin Primary Care NA NA Yes NA NA Yes
#> 6 Dr. Stafford Primary Care NA NA Yes NA NA Yes
#> # β¦ with abbreviated variable names ΒΉβ`Date Stop`, Β²β`Include on Platform`,
#> # Β³βDecision, β΄β`Include on Checks`
dribble_folder = drive_get(id_folder)
print(dribble_folder)
#> # A dribble: 1 Γ 3
#> name id drive_resource
#> <chr> <drv_id> <list>
#> 1 Example_Demo_Folder 1AMKXrcB8X4H2V7lf0HqOVS4L54LJd3KB <named list [32]>
drive_ls
We expect to fail given not an ID given the comment on: https://github.com/tidyverse/googledrive/issues/154#issuecomment-1290766313
try({
drive_ls(id_folder)
})
#> # A dribble: 1 Γ 3
#> name id drive_resoβ¦ΒΉ
#> <chr> <drv_id> <list>
#> 1 EXAMPLE_Demo_Main_Sheet 1FJpa4f2_OKqR0cbwSmWhyn5N3oJ17kZu3HGytGZ⦠<named list>
#> # β¦ with abbreviated variable name ΒΉβdrive_resource
try({
drive_ls(id_folder, shared_drive = id_shared_drive)
})
#> # A dribble: 1 Γ 3
#> name id drive_resoβ¦ΒΉ
#> <chr> <drv_id> <list>
#> 1 EXAMPLE_Demo_Main_Sheet 1FJpa4f2_OKqR0cbwSmWhyn5N3oJ17kZu3HGytGZ⦠<named list>
#> # β¦ with abbreviated variable name ΒΉβdrive_resource
We see no issue auth-wize on this
dribble_folder = drive_get(id_folder)
print(dribble_folder)
#> # A dribble: 1 Γ 3
#> name id drive_resource
#> <chr> <drv_id> <list>
#> 1 Example_Demo_Folder 1AMKXrcB8X4H2V7lf0HqOVS4L54LJd3KB <named list [32]>
dr = dribble_folder$drive_resource[[1]]
dr$driveId == as.character(id_shared_drive)
#> [1] TRUE
dr$teamDriveId == as.character(id_shared_drive)
#> [1] TRUE
dr$parents[[1]] == as.character(id_shared_drive)
#> [1] TRUE
I think this should work given dribble and have access to the sheet, and the drive has
try({
drive_ls(dribble_folder)
})
#> # A dribble: 1 Γ 3
#> name id drive_resoβ¦ΒΉ
#> <chr> <drv_id> <list>
#> 1 EXAMPLE_Demo_Main_Sheet 1FJpa4f2_OKqR0cbwSmWhyn5N3oJ17kZu3HGytGZ⦠<named list>
#> # β¦ with abbreviated variable name ΒΉβdrive_resource
try({
drive_ls(dribble_folder, shared_drive = id_shared_drive)
})
#> # A dribble: 1 Γ 3
#> name id drive_resoβ¦ΒΉ
#> <chr> <drv_id> <list>
#> 1 EXAMPLE_Demo_Main_Sheet 1FJpa4f2_OKqR0cbwSmWhyn5N3oJ17kZu3HGytGZ⦠<named list>
#> # β¦ with abbreviated variable name ΒΉβdrive_resource
try({
dribble_drive = drive_get(id_shared_drive)
})
dribble_sheet = drive_get(id_sheet)
dr = dribble_sheet$drive_resource[[1]]
dr$driveId == as.character(id_shared_drive)
#> [1] TRUE
dr$teamDriveId == as.character(id_shared_drive)
#> [1] TRUE
dr$parents[[1]] == as.character(id_folder)
#> [1] TRUE
Created on 2023-03-07 with reprex v2.0.2
It sounds like the drive_ls()
call needs the corpus
to be specified.
Hi, I'm trying to use googledrive to access some data that has been shared to me in a folder. When I do a drive_ls() command, I get the folder name, but I can't seem to get a list of the contents of the folder. Is this a known feature?
thanks!