Closed AakashBasu closed 5 years ago
Hi, perhaps you could do it in a loop, e.g.:
listTitle = "Documents" site = "abc"
def fncPrintLibraryContents(ctx, listTitle):
try:
list_object = ctx.web.lists.get_by_title(listTitle)
folder = list_object.root_folder
ctx.load(folder)
ctx.execute_query()
files = folder.files
ctx.load(files)
ctx.execute_query()
return files
except:
print('Problem printing out library contents')
sys.exit(1)
def downloadFile(ctx, fileName):
try:
with open(fileName, "wb") as localFile:
relativeUrl = '/sites/{0}/Shared%20Documents/{1}'.format(site, fileName)
response = File.open_binary(ctx, relativeUrl)
localFile.write(response.content)
localFile.close()
except:
print('Problem downloading file:', fileName)
sys.exit(1)
myfiles = fncPrintLibraryContents(ctx, listTitle)
for myfile in myfiles: print("Downloading file: {0}".format(myfile.properties["Name"])) downloadFile(ctx,` myfile.properties["Name"])
m.
pls, indent last two lines in the for loop, I can't seem to do it. m.
Hey,
Thanks for such a quick reply. I am being able to successfully download the files, given, I have to give till the file name. But, to be able to recursively download all the files, I need to first list all the existing ones in a particular folder which after several trials, getting Not Found errors. Maybe I am going wrong somewhere, because my concept of Title is not right, so whenever I am trying to list a subfolder by giving that name as a title, I fail. I will go through your code and see if I am able to do it.
Meanwhile, my current running code (Downloading works fine, listing folders and files for root is working but whenever in Title I am giving any specific folder name other than Documents, it fails):
`from office365.runtime.auth.authentication_context import AuthenticationContext from office365.sharepoint.client_context import ClientContext from office365.sharepoint.file import File from office365.sharepoint.file_creation_information import FileCreationInformation
def read_folder_and_files(context, list_title): """Read a folder example""" list_obj = context.web.lists.get_by_title(list_title) folder = list_obj.root_folder context.load(folder) context.execute_query() print("List url: {0}".format(folder.properties["ServerRelativeUrl"]))
files = folder.files
context.load(files)
context.execute_query()
for cur_file in files:
print("File name: {0}".format(cur_file.properties["Name"]))
folders = context.web.folders
context.load(folders)
context.execute_query()
for folder in folders:
print("Folder name: {0}".format(folder.properties["Name"]))
def download_file(context): response = File.open_binary(context, "/sites/new/Shared Documents/2011-A/file1.csv") print(response) print(response.content) with open(r"C:\Users\aakashb\Downloads\test\file1.csv", "wb") as local_file: local_file.write(response.content)
ctx = None url = 'https://company.sharepoint.com/sites/new' ctx_auth = AuthenticationContext(url=url) if ctx_auth.acquire_token_for_user(username='name.surname@company.com', password='12345'): ctx = ClientContext(url, ctx_auth) read_folder_and_files(ctx, 'Documents')
print('exiting function')`
1) Sorry for the broken structure of my code I gave you. 2) Just ran your code and checked, it is doing exactly what my code is doing in terms of listing. It is listing the files in the root (not inside any folder). But I want to do the same for folders. 3) I also want to list the folders. When I use @vgrem 's code of listing folders, it is not showing me the folders of the Documents, but showing folders like:
Folder name: SitePages Folder name: Style Library Folder name: _catalogs Folder name: FormServerTemplates Folder name: _private Folder name: Sharing Links Folder name: SiteAssets Folder name: images Folder name: Shared Documents Folder name: Lists Folder name: _cts
Which are none of the folders I have in the SharePoint Doc Lib.
So, in short, how can I list Doc Lib folders and their respective files to be downloaded?
Hi,
please look at the issue here: https://github.com/vgrem/Office365-REST-Python-Client/issues/91
specifically at the line that goes like this:
folder = ctx.web.get_folder_by_server_relative_url(app_settings['urlrel'])
If it won't help then I'll get back to you to provide more details. m.
... what I meant was using get_folder_by_server_relative_url method instead of get_by_title, e.g.
app_settings = {'urlrel': '/sites/abc/Shared Documents/TEST'}
def printFolderContents(ctx, listTitle):
try:
#list_object = ctx.web.lists.get_by_title(listTitle)
folder = ctx.web.get_folder_by_server_relative_url(app_settings['urlrel'])
#folder = list_object.root_folder
ctx.load(folder)
ctx.execute_query()
#print(folder.url)
files = folder.files
ctx.load(files)
ctx.execute_query()
for myfile in files:
print("File name: {0}".format(myfile.properties["Name"]))
except:
print('Problem printing out library contents')
sys.exit(1)
Let me know if that helps ...
to download the files inside TEST folder within Shared Documents library you can for instance alter the above code to make it a function, such as:
def fncGetFolderContents(ctx, listTitle):
try:
#list_object = ctx.web.lists.get_by_title(listTitle)
folder = ctx.web.get_folder_by_server_relative_url(app_settings['urlrel'])
#folder = list_object.root_folder
ctx.load(folder)
ctx.execute_query()
#print(folder.url)
files = folder.files
ctx.load(files)
ctx.execute_query()
#for myfile in files:
# print("File name: {0}".format(myfile.properties["Name"]))
return files
except:
print('Problem printing out library contents')
sys.exit(1)
and alter the download function a little, e.g:
def downloadFolderFile(ctx, fileName):
try:
with open(fileName, "wb") as localFile:
relativeUrl = '/sites/{0}/Shared%20Documents/{1}/{2}'.format(site, yourFolder, fileName)
#relativeUrl = app_settings['urlrel']
response = File.open_binary(ctx, relativeUrl)
localFile.write(response.content)
localFile.close()
except:
print('Problem downloading file:', fileName)
sys.exit(1)
myfiles = fncGetFolderContents(ctx, listTitle)
for myfile in myfiles: print("Downloading file: {0}".format(myfile.properties["Name"])) downloadFolderFile(ctx, myfile.properties["Name"])
Thanks a lot man! The two of you are really prompt in replies, as well as the API is absolutely awesome!
I will go through it ASAP and try to replicate. But, is there a way to list the folders? I mean, the latest code you gave will work when I know the folder name. In case I automate the process and new folder is created and files are kept, it won't work for the new folder, right? That's why I also wanted listing folder, just in-case. Anyway, the present solution should work for my use-case.
Lot of thanks to both of you. I will update here, once I run the experiment.
Don't thank me, @vgrem is to blame :) ... and I'm not sure, maybe there are other ways of achieving the same ....
right, to list all the folders inside Shared Documents document library you may try:
list_object = ctx.web.lists.get_by_title(listTitle)
folder = list_object.root_folder
ctx.load(folder)
ctx.execute_query()
folders = folder.folders
ctx.load(folders)
ctx.execute_query()
for myfolder in folders:
print("File name: {0}".format(myfolder.properties["Name"]))
m.
Fantastic. Iterative folder content printing and download worked!
Thank you,
This code downloads corrupted pdf files. THey are empty - 156 bytes. Any ideas why?
I am also getting corrupted pdf files with only 1kb filename by using above cosde. Any idea?
I am also getting corrupted pdf files with only 1kb filename by using above code. Any idea?
I figured it out, for me the reason was the relative url. When I need to list folder content, I don't need to add /sites/sitename/library etc., it just has to be /library. But when I am downloading the files already, I need to add /sites/sitename/folder/file.
This is really weird, because I still can access and download files without adding /sites/sitename/, but the content is corrupted then. At the same time, if I add /sites/sitename/ when I am getting folder content, it throws an error, and only works if I start relative url with a library.
It is weird that every single resource suggests to add /sites/sitename to relative url for both folder content and file content.
Thanks for suggestion. can you share final working code . If we want to download all contents of subfolder like /sites/sitename/Documents/somefolder then what would be final code?
Thanks guys. This helps solve a lot of problems and issues faced while using the Sharepoint package.
Hi Friends,
Do you have any idea, how to download large csv files larger than 10GB in small chunk. because AWS lambda can't handle large files like this.
If possible, share the code snippet as well.
Thanks in advance!
My Python 3 code:
from office365.runtime.auth.authentication_context import AuthenticationContext from office365.sharepoint.client_context import ClientContext
url = 'https://company.sharepoint.com/sites/abc' ctx_auth = AuthenticationContext(url=url) if ctx_auth.acquire_token_for_user(username='abcd.xyz@company.com', password='12345'): ctx = ClientContext(url, ctx_auth) lists = ctx.web.lists ctx.load(lists) ctx.execute_query() for l in lists: print(l.properties['Title'])
From the above code, I can list the items in the site. But my plan is to run this entire module in AWS Lambda using Python and download from SharePoint Documents and store in AWS S3.
A folder can have multiple files. I want to download the entire folder with all the files. Anyone did this? Any help? A working code shall be a great help as I am totally new to web scraping!