njosefbeck / gatsby-source-stripe

Gatsby source plugin for building websites using Stripe as a data source
74 stars 17 forks source link

Stop trying to download the file object URL #31

Closed Francesco-Lanciana closed 5 years ago

Francesco-Lanciana commented 5 years ago

Description

Currently we try to use the url property of the Stripe file Object when downloading files is set too true. This never works as it doesn't actually point to any files. @brxck said previously that we don't use the urls contained in the objects that make up payload.links.data because:

  1. We want to avoid downloading files multiple times if it has multiple links
  2. It only contains links to publicly accessable files.

The more I think about it the less this makes sense to me. My arguments for no longer using the url property and instead switching to links are:

  1. The url property will never work (at least I don't currently know how to make it work)
  2. I'm not aware of any way to provide multiple links to the same file in Stripe 2a. Even if there were it's still better to download multiple than nothing at all
  3. I'm currently chasing this up with Stripe support but I don't actually think it's possible to make files private. All Stripe hosted files seem to be public.

Steps to Reproduce

  1. Upload an image to Stripe (SKU's let you upload an image to Stripe)
  2. Add "File" to options.objects of this plugin
  3. Run "npm run start" and you will see that the URLs associated with the File object will fail

Additional Information

I am currently chasing up the following two questions with Stripe support and will get post back when I have answers:

  1. What is the URL property of the File object actually used for? Can we make some use of it?
  2. Are all images hosted by Stripe public or is it possible to have private images/files. 2a. In this case what HTTP headers would need to be provided to pull down these files.

A reference to the File object: https://stripe.com/docs/api/files/object

brxck commented 5 years ago

The url property will never work (at least I don't currently know how to make it work)

I'd be interested to hear how Stripe support responds. In my testing, downloading Files using the url field & authentication has functioned properly, though I haven't tested this since the refactor.

I'm not aware of any way to provide multiple links to the same file in Stripe. ... I'm currently chasing this up with Stripe support but I don't actually think it's possible to make files private.

Files are not publicly accessible unless they have File Links associated with them. It is possible and can be necessary to create multiple File Links because each can have it's own expiration date and metadata. It would be problematic if all Files were publicly accessible because they can contain sensitive information, ex. addresses in a dispute document.

Francesco-Lanciana commented 5 years ago

Ahhhhh right that makes sense, although I'm still not really sure what use case it fits... I'm also confused on three other points:

  1. I never created any file links manually which means Stripe made them for me when I uploaded the image, which is a strange security stance to take...
  2. The SKU objects returned for me have a URL that corresponds to one of the links... why would it not use the URL that requires authorisation. Or does it just figure that you made it public so lets use the public one.
  3. If you provide an API key when accessing an image that doesn't need authentication (i.e. a link url) the request actually fails (even if it's the correct API key). Whyyyyyyyy? That's just confusing and would mean that to really be robust we should be checking for the existance of links to determine if an SKU image requires authentication or not.
njosefbeck commented 5 years ago

A couple of thoughts...

I can see why letting files not be immediately publicly accessible would make sense, as it could contain sensitive information (as Brock mentioned). I can also see why you might want to make that file publicly accessible for a time. For example, a customer disputes some aspect of a sale, you could send them a link to some file showing what they paid for, their credit card number, whatever. You'd of course want this link to be short-lived, to hopefully prevent it from getting out beyond the customer you're in communication with. (This is just a scenario I came up with on the fly.)

1., 2. For SKU images, maybe Stripe makes the file links by default because it figures you're going to be using this image in some kind of online store, and so most likely it will need to be public by default.

  1. Isn't it always the case that a SKU image doesn't require authentication, since it's a public image? The fact that it has a File Link would seem to suggest that. So I guess Stripe doesn't like that you pass an API key unnecessarily?

Let us know what Stripe support says! And thanks for bringing up these things.

njosefbeck commented 5 years ago

@Francesco-Lanciana Checking in, any updates on this from Stripe support?

Also, I'm planning on digging into the code that supports this feature this week, so that I have a better understanding of how it works. As far as I understand it, these are the use cases we're looking to support:

As a plugin user, I want to:

  1. Download the Stripe-hosted image for each of my Stripe Skus.
  2. Download the image for each of my Stripe Skus, hosted outside of Stripe, but added to the Sku object under the "image" key.
  3. Download the images for each of my Products, hosted outside of Stripe, but added to the Product object under the "images" key. The Product object is a top-level object as well as can live under a Sku.
  4. Download a private File via the url field, passing in my "live secret API key".
  5. Download a public File via File Link, which exists on the File object under the "links" key.

If that is representative of the use cases we're trying to cover, then I feel like the first two are the most important to get consistently working, as I think they're the most common use cases. If we have (1) and (2) consistently working, then I feel like it wouldn't be tough to get (4) working as File Links are publicly accessible and thus don't require any type of auth. And then, maybe we can add a "download private files" option or something for (3) once the others are taken care of.

Any thoughts on the above? CC: @brxck

Francesco-Lanciana commented 5 years ago

Hey sorry for the slow reply! Totally agree with using this thread as the main one, I was starting to get confused with my own posts. Regarding 1 and 2 I don't think SKUs have multiple images, just one that stripe hosts and is on the SKU object under the image key. I agree with the order of importance though.

Also I think you are right about Stripe automatically creating the File Link for SKU images. What's weird is that my SKU image has about 5 or so File Links, this is a completely unproven hunch but I have deleted and remade that SKU multiple times and I'm wondering if it somehow keeps the images around even when you delete the associated SKU object. Not super important just find it weird.

As for the pull request I created, I'm totally happy if you want to close it for now. The functionality I removed is still necessary to support downloading the private files so it was a bit of overkill. I'm planning to take another crack at it in the coming week and improve things a bit. I was currently making the distinction between remote hosted files and stripe hosted files because I thought stripe hosted files were always private. Now that I know they aren't the distinction makes less sense. I still don't think we need to give devs the auth option though. We should just pass the auth headers where necessary and not when they aren't.

njosefbeck commented 5 years ago

I updated the use cases above so that 1 and 2 were clearer. I also added the use case related to downloading Product images.

This evening I spent some good amount of time walking through the code in LocalFile.js, testing the functionality specifically around use case 1 above. I've determined that it's possible to fix the plugin so that it's possible to download Stripe-hosted Sku images. The following would need to be done:

-- When inside downloadStripeHostedFile remove the auth key from createRemoteArgs, so that you don't pass it to createRemoteFileNode.

-- Once we've successfully downloaded the images, they exist in the gatsby-source-filesystem cache (found at .cache/gatsby-source-filesystem) and their request headers are cached in .cache/caches/gatsby-source-stripe. So, in downloadStripeHostedFile we need to return early if a cached header exists for the image. Adding the below code will prevent us logging misleading errors.

So, the start of downloadStripeHostedFile should look something like:

async downloadStripeHostedFile(url, type, parentNodeId) {
    if (!url) return null;

    // Return early if the file is already in the cache
    const cachedHeaders = await this.createRemoteArgs.cache.get(`create-remote-file-node-${url}`)
    if (cachedHeaders && cachedHeaders.etag) {
      return null
    }

   const { auth, ...createRemoteArgsWithoutAuth } = this.createRemoteArgs; // eslint-disable-line no-unused-vars

-- Switch order of if statements in convertToArray. Currently, if an object is passed into that function, !entity || !entity.length will evaluate to true, and thus the function will return an empty array when it shouldn't. Swapping the !Array.isArray check with that one fixes this issue.

-- In my use case, none of my Products had images, so when going into downloadRemoteHostedFiles, the urls param is empty. We should return null early if that's the case.

The above should at least fix (1). To test the other use cases, I'll need to add some additional images to my Stripe test data.

For now though, @Francesco-Lanciana would you be able to move forward with the above changes? I'd like to get these into the plugin as soon as possible. Let me know, as otherwise I may have some time this week to knock it out.

Also, as more of a note for myself, when testing this functionality, I had to go into .cache and .cache/caches and delete the gatsby-source-filesystem and gatsby-source-stripe caches after each gatsby build.

Let me know if you have any questions! Thanks!

Francesco-Lanciana commented 5 years ago

Hey so I have addressed everything except for the cache issue. I have:

  async downloadHostedFiles(urls, useAuth, parentNodeId, type) {
    if (!urls) return;

    const urlsArray = convertToArray(urls).filter(url => url);

    // We want to check if each file is in the cache
    const checkCachePromises = urlsArray.map((url) => {
      const cacheHeaders = this.createRemoteArgs.cache.get(`create-remote-file-node-${url}`);
      return cacheHeaders;
    });
    const checkCacheResults = await Promise.all(checkCachePromises);

    // This works because map and Promise.all are strictly ordered
    const urlsNotCached = urlsArray.filter((urls, i) => {
        if (checkCacheResults[i] && checkCacheResults[i].etag) return false;
        return true;
    })

    // If all files have been cached then we don't have to do anything
    if (urlsNotCached.length === 0) return null;

    const { auth, ...createRemoteArgsWithoutAuth } = this.createRemoteArgs; // eslint-disable-line no-unused-vars
    const ext = type && `.${type}`;

    try {
      const fileNodePromises = urlsNotCached.map(url => {
        const createRemoteArgs = useAuth
          ? { url, parentNodeId, ext, ...this.createRemoteArgs }
          : { url, parentNodeId, ext, ...createRemoteArgsWithoutAuth };

          return createRemoteFileNode(createRemoteArgs);
      });
      const fileNodes = await Promise.all(fileNodePromises);

      return this.validateFileNodes(fileNodes);
    } catch (e) {
      const URLStrings = urlsNotCached.reduce(
        (URLString, url, i) => URLString + `URL ${i + 1}: ` + url + "\n",
        ""
      );
      console.log(
        "\x1b[1;31m\u2715\x1b[0m We were unable to download the following files:\n" +
          URLStrings +
          `Error: ${e.message}\n`
      );
      return null;
    }
  }

Here is the error:

Screen Shot 2019-05-01 at 10 03 13 pm

Is this something that I should just be wrapping in a try catch and calling a day or is there something wrong?

I am going to be a bit short of time in the next day or two so I have made a pull request with all the changes except those relating to the cache. #35 This way you can take a crack at the cache issue, you will probably be able to fix it quicker than I can.

Let me know if there is something wrong with the PR, I can spare time for a quick fix. Also my contact with Stripe support didn't really go anywhere, they more or less said they weren't developers by trade and their work flow didn't allow them time to help with granular developer assistance.

njosefbeck commented 5 years ago

Thanks for this quick turnaround! I'll take a look over the next couple of days and let you know :)

njosefbeck commented 5 years ago

Closing this for now, due to rewriting of this functionality which is now on NPM under v3.0.1.