googlearchive / PyDrive

Google Drive API Python wrapper library
Other
1.31k stars 273 forks source link

Losing file content during upload() #64

Closed cogerk closed 8 years ago

cogerk commented 8 years ago

Hi All,

This is also an open question on stack, but after no real response there I thought I ought to take it to the source.

I currently have a 34x22 .xlsx spreadsheet. I am downloading it via pydrive, filling in some of the blank values, and uploading the file back via pydrive. When I upload the file back, all cells with formulas are blank (any cell that starts with =). I have a local copy of the file I want to upload, and it looks fine so I'm pretty sure the issue must be with pydrive.

My code:

def upload_r1masterfile(filename='temp.xlsx'):
        """
        Upload a given file to drive as our master file
        :param filename: name of local file to upload
        :return:
        """
        # Get the file we want
        master_file = find_r1masterfile()
        try:
            master_file.SetContentFile(filename)
            master_file.Upload()
            print 'Master file updated. ' + str(datetime.datetime.now())
        except Exception, e:
            print "Warning: Something wrong with file R1 Master File."
            print str(e)
            return e

The only hint I have is that if I add the param={'convert': True}tag to Upload, then there is no loss. However, that means I am now working in google sheets format, and I would rather not do that. Not only because it's not the performed format to work with here, but also because if I try to master_file.GetContentFile(filename) I get the error: No downloadLink/exportLinks for mimetype found in metadata

Any hints? Is there another attribute on upload that I am not aware of?

I haven't found any clues in the documentation, so my recommendation would be to add something describing possible parameters in the Upload function, as I found that only after some googling and no where in the documentation.

Thanks!

RNabel commented 8 years ago

Thanks, @cogerk for posting this here. 1) When you upload the file, did you close Excel first? IIRC MS Office writes a lot of the content to a temporary file, so that may explain why some parts are missing. If you tried the non converting upload first, the full file may have been saved to disk between the two tries, and thus the second converting upload attempt worked. 2) GetContentFile takes a second argument called mimetype, which should allow you to download the file. Could you try .GetContentFile(filename, mimetype="application/vnd.ms-excel")? If that mimetype doesn't work as anticipated, there is a great StackOverflow post here which lists a bunch of different types you can try.

cogerk commented 8 years ago

Thanks for your response @RNabel! This was so helpful!

1) I tried again keeping excel closed and that worked. The one hiccup is when I preview the file the cells with formulas are blank and when I open the file excel says it needs to repair the file. However, once it repairs the file, everything is OK. The repair happens locally and on the drive, so the issue is now in how I am writing to cells, so I'll look into that next.

2) When I saw that cells were still blank in the preview mode, I tried this route too. This also worked! I ended up using 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')` since I'm working in .xlsx. Thanks for the tip and the StackOverflow link.

For future googlers looking for code examples, with that change my functions look like this:

def save_r1masterfile(csv, rows_to_skip=12, filename='temp.xlsx'):
    """
    Save the R1 master file from google drive as local file or dataframe.
    :param csv: boolean, if true, save as file, else return a dataframe
    :param rows_to_skip: int, number of rows to skip (if returning dataframe)
    :param filename: string, file to save as (if returning the file)
    :return:
    """
    # Get the file we want
    master_file = find_r1masterfile()
    # Try downloading the file and return an error if that doesn't work
    try:
        # This mimetype allows downloading google sheets file as a .xlsx
        master_file.GetContentFile(filename,
                                   mimetype='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
    except Exception, e:
        print "Warning: Something wrong with downloading R1 Master File save."
        print str(e)
        return e
        # TODO: add an email alarm to responsible user
    if csv: # If we want as a csv...
        return master_file
    else: # If we want as a dataframe...
        df = pd.read_excel(filename,
                           skiprows=rows_to_skip,
                           sep='\t',
                           index_col='Date')
        remove_file(filename)
        # TODO: Is this broken?
        return df

def upload_r1masterfile(filename='temp.xlsx'):
    """
    Upload a given file to drive as our master file
    :param filename: name of local file to upload
    :return:
    """
    # Get the file we want from drive
    master_file = find_r1masterfile()
    # Try uploading the file and if it doesn't work return an error
    try:
        master_file.SetContentFile(filename)
        # Convert parameter will convert the .xlsx file to a google sheets file
        master_file.Upload(param={'convert': True})
        print 'Master file updated. ' + str(datetime.datetime.now())
        return
    except Exception, e:
        print "Warning: Something wrong with uploading R1 Master File."
        print str(e)
        return e

This is what the full code looks like today

RNabel commented 8 years ago

@cogerk glad to hear it worked, and thank you for sharing the working code. Re: 2) I'll add this mimetype into the FAQ section for future reference.

FYI: When including blocks of code in Markdown, use `` to delimit the code block instead of a single.

cogerk commented 8 years ago

Thanks :) My bad.