pnp / PnP

SharePoint / Office 365 Developer Patterns and Practices - Archived older solutions. Please see https://aka.ms/m365pnp for updated guidance
https://aka.ms/m365pnp
Other
1.9k stars 3.31k forks source link

UploadFile method is bloated, needs refactoring #309

Closed sumanchakrabarti closed 9 years ago

sumanchakrabarti commented 9 years ago

The FileAndFolderExtensions.UploadFile method is pretty bloated. It has 322 lines of code and another 150 lines for overloaded methods. I'm also finding that it fails with a 403 Forbidden in certain cases (not sure, why, yet). Furthermore, there are a few foreach loops and if the file hash check requires a download, the file size could be 2GB, which would mean the transmission of data is double for every call. We want to stay simple, simple, simple in the framework, but offer all the rich value that was added here.

I was hoping to simplify the UploadFile by making it a very simple function and only perform the file upload. Then take the rich parts of the method (checkout, checkHashBeforeUpload, setFileProperties, upload via webdav) and put them in extra methods:

How does the community feel about this? @sgryphon @erwinvanhunen @pbjorklund

erwinvanhunen commented 9 years ago

Makes a lot of sense to me. Do it :-)

sumanchakrabarti commented 9 years ago

Good. I'm going to put code review comments into the current repo to explain why as well.

sumanchakrabarti commented 9 years ago

Nevermind. I forgot that's a feature of TFS, not available in GitHub. :(

sumanchakrabarti commented 9 years ago

Update is in the dev branch.

jansenbe commented 9 years ago

Thanks!

sgryphon commented 9 years ago

It depends on what the objective of the API is, and how it is used, i.e. design the API from the client side experience.

The current methods are optimised to the sorts of things that I have used it for -- mostly for provisioning of branding files into SharePoint. For these functions I want to write one line of code per item uploaded, which brings the file up to the required published state irrespective of what features are turn on or current state.

As I may run this many, many times in development (multiple times per day), I don't want to create new versions if the files have not changed (hence the download check). Obviously the "correct" API for a bunch of 2 kB branding files is very different than for a 2 GB file.

(Still, depending on the frequency of change if download speed is faster than upload it may be quicker to download and check the file hash; and if you know the 2 GB file is always going to be different you can skip the hash check.)

I'm using it mostly for repeatedly deployment a relatively large number of branding files, and so would like to retain a single line method that does everything needed to bring the file up to the desired state.

I think you should check what other scenarios people are using the API in, and design for what is most common (i.e. design for the clients, rather than just focussing on the internals).

sumanchakrabarti commented 9 years ago

This method is a core method called UploadFile which happens to be referenced by UploadThemeFile. It was doing several other operations and not without the option to choose otherwise. In unit testing this, we would have to write unit tests to cover every case several times for one function and may not hit every scenario. If we have to write so many test cases, there's likely a problem with the size of the function.

I did additional analysis of the code and found that the function's complexity was very high and maintainability a bit low. Breaking it up eliminates the potential for failure.

Let's keep things simple, this is a framework, not an application. Everything has to be completely solid so that it can be used everywhere.

Also, keep in mind the number of loops and execute query calls. Several loops within a function as well as within loops causes the function to go from a linear performance to parabolic--the way it was coded, it was actually exponential. With the number of loops in that function, as well as the potential to download a 2GB file using that function, perform a hash check on it, then upload the file, it's just too much risk to bring down performance. I know it's the cloud, but we still have to be responsible with resources.

I broke it up into a few functions where you can do the same exactly logic. It can be done optionally by the dev wanting to go that route. I highly recommend this approach to ensure performance for every file uploaded, not just theme files.

sumanchakrabarti commented 9 years ago

Furthermore, it's great to use extension methods, but we shouldn't need to have 4 overloads atop the extension methods. There are so many signatures per method, it's getting out of hand. Remember, this is for all to use. Think generically and try to cover smaller cases. This will enable developers to be creative with how they approach solving a problem rather than call one do-it-all method.

sumanchakrabarti commented 9 years ago

I updated the UploadThemeFile method and the ProvisionFileInternal method. The ProvisionFileInternal method required that all the new methods to match what the old method was doing. However, there are no unit tests for ProvisioningExtensions.

@sgryphon Can you add some unit tests for this code--it's primarily your contribution.