koltyakov / sppull

📎 Download files from SharePoint document libraries using Node.js without hassles
MIT License
46 stars 16 forks source link

Download from SAML SP when not connected to company internet #40

Closed acrovitic closed 4 years ago

acrovitic commented 4 years ago

I just want to start out by saying that good lord this module is amazing! When I'm connected to my company's wifi, sppull works extremely quickly and is absolutely phenomenal. What takes my current Python ETL script 60-70 seconds, sppull completes in under 7 seconds. I'd like to switch all of our ETL pipelines over to sppull if I can get this authentication issue figured out.

The issue is that when I try to use the exact same script (and same config.json) when not connected to my company's wifi (in this particular case, on a server hosted by Azure), I keep getting the following access denied error:

Error in operations.downloadFile: <s:Fault>
  <s:Code>
    <s:Value>s:Sender</s:Value>
    <s:Subcode>
      <s:Value xmlns:a="http://docs.oasis-open.org/ws-sx/ws-trust/200512">a:FailedAuthentication</s:Value>
    </s:Subcode>
  </s:Code>
  <s:Reason>
    <s:Text xml:lang="en-US">MSIS7068: Access denied.</s:Text>
  </s:Reason>
  <s:Detail>
    <IssuanceAuthorizationFault xmlns="http://schemas.microsoft.com/ws/2009/12/identityserver/" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"/>
  </s:Detail>
</s:Fault>
Core error has happened Error: <s:Fault>
  <s:Code>
    <s:Value>s:Sender</s:Value>
    <s:Subcode>
      <s:Value xmlns:a="http://docs.oasis-open.org/ws-sx/ws-trust/200512">a:FailedAuthentication</s:Value>
    </s:Subcode>
  </s:Code>
  <s:Reason>
    <s:Text xml:lang="en-US">MSIS7068: Access denied.</s:Text>
  </s:Reason>
  <s:Detail>
    <IssuanceAuthorizationFault xmlns="http://schemas.microsoft.com/ws/2009/12/identityserver/" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"/>
  </s:Detail>
</s:Fault>
    at config_1.request.post.then.xmlResponse (C:\Users\Name\node_modules\node-sp-auth\lib\src\utils\AdfsHelper.js:31:23)
    at tryCatcher (C:\Users\Name\node_modules\bluebird\js\release\util.js:16:23)
    at Promise._settlePromiseFromHandler (C:\Users\Name\node_modules\bluebird\js\release\promise.js:547:31)
    at Promise._settlePromise (C:\Users\Name\node_modules\bluebird\js\release\promise.js:604:18)
    at Promise._settlePromise0 (C:\Users\Name\node_modules\bluebird\js\release\promise.js:649:10)
    at Promise._settlePromises (C:\Users\Name\node_modules\bluebird\js\release\promise.js:729:18)
    at _drainQueueStep (C:\Users\Name\node_modules\bluebird\js\release\async.js:93:12)
    at _drainQueue (C:\Users\Name\node_modules\bluebird\js\release\async.js:86:9)
    at Async._drainQueues (C:\Users\Name\node_modules\bluebird\js\release\async.js:102:5)
    at Immediate.Async.drainQueues (C:\Users\Name\node_modules\bluebird\js\release\async.js:15:14)
    at runCallback (timers.js:789:20)
    at tryOnImmediate (timers.js:751:5)
    at processImmediate [as _immediateCallback] (timers.js:722:5)

I thought maybe I needed to reform my siteUrl to match what my current Python ETL uses for authentication, as the Python ETL works fine on the aforementioned server that's not connected to the company's internet. However, that attempt led to the same access denied error above.

I looked into your code and auth examples for any additional options I may need to pass for SAML authentication (company uses what i assume is SAML to enable instant access to any company asset without having to login after logging into your company issued laptop), but couldn't find anything. I'd appreciate any help, as I'm not sure where to even start with this issue. Code is as follows:

config.json

{
    "siteUrl":"https://company.sharepoint.com/teams/teamname",
    "creds":{
            "username":"email@company.com",
            "password":"password"
            }
}

sharepoint_etl.js

var sppull = require("sppull").sppull;
const fs = require("fs");

let rawdata = fs.readFileSync("config.json");
let context = JSON.parse(rawdata);

var array = [
"/Shared%20Documents/A%20Folder/Sub%20Folder/Some%20Excel%20File.xlsm",
"/Shared%20Documents/A%20Folder/Different%20Sub%20Folder/Another%20Excel%20File.xlsx",
"/Shared%20Documents/A%20Folder/Yellow%20Sub%20Folder/Yet%20Another%20Excel%20File.xlsx",
"/Shared%20Documents/A%20Folder/Sub%20Ways%20Folder/We%20Really%20Need%20A%20SQLDB.xlsx",
]

array.forEach(function(path){
        // pass each relative url+file name above in, split, and form input dictionary for sppull
    var splitpath = path.split("/")
    var option = {
        spRootFolder: splitpath.slice(0, -1).join("/"),
        dlRootFolder: "./output/",
        strictObjects: [splitpath[splitpath.length -1]]
    }
    sppull(context, option)
    .then(function(downloadResults) {
        console.log("Files are downloaded");
        console.log("For more, please check the results", JSON.stringify(downloadResults));
    })
    .catch(function(err) {
        console.log("Core error has happened", err);
    })
});
koltyakov commented 4 years ago

Hi @acrovitic, thanks for the good words and using the library.

It can be many factors and difference when accessing SP from within local and external networks. It's better reaching your admins asking what they configured exactly.

Also, it can be proxy settings. In case of Node.js application and network proxy, process.env.HTTP_PROXY, process.env.HTTPS_PROXY, etc. are required to be configured. But I don't think it's your case.

While you know that auth in Python works in both scenarios, it can be something else.

The message you're providing is an ADFS response. Once again it's better to ask admins that that's app you use in Python can authenticate, but another one receives MSIS7068: Access denied from ADFS. Maybe in an external network, you got to provide 2FA code? Are you able to log in with that account in a browser under the same circumstances?

You can try this helper project for troubleshooting, it not only asks for creds interactively but also outputs a detailed error message if any. And check ADFS or SAML, etc. With SPO it's better using Add-In Only auth though. More about supported auth strategies.

Regarding your sample, it would be faster if to pass explicit objects in a single sppull session. Authentication can be expensive (1-2 sec), it's cashed, but anyways, strict objects are downloaded effectively. Also, forEach sends all the promises simultaneously which is not always desired.

koltyakov commented 4 years ago

Going to close this one.