w3c / FileAPI

File API
https://w3c.github.io/FileAPI/
Other
104 stars 44 forks source link

Support `Response` in `URL.createObjectURL` #97

Closed bmeck closed 5 years ago

bmeck commented 6 years ago

For Streaming/asynchronous content and for content that is intended to have specific headers that are not the defaults in a Blob based URL.createObjectURL workflow it would be nice to support putting Response Objects as the backing resource for URL.createObjectURL.

wanderview commented 6 years ago

Personally I would much rather make it possible to set the Response on various .src or .srcobj properties.

bmeck commented 6 years ago

@wanderview that won't work for my uses which are putting the URL inside of JS which requires a string:

import "blob: ..."
bmeck commented 6 years ago

After some talks with @domenic and @wanderview it seems like this might need to also have some flags setup to make 2 conditions available:

This is intended to allow garbage collecting the response body on use.

Microsoft has an options bag with oneTimeUse which seems sane to keep in sync with.

I could not find any prior implementation that is doing this. I do not have a personal use for this, but the concern seems to lie in having other documents able to load URLs.

Chat Log Reference: https://freenode.logbot.info/whatwg/20180212#c1432816

We could split these flags into separate PRs is my general thinking, and if that seems fine I can do so. There also seems to be a request for which environments are seeking to add this functionality. I am coming from Node so I don't know if that counts?

mkruisselbrink commented 6 years ago

I don't know if it makes sense to group this in with existing blob: URLs if the semantics are going to be so drastically different. Either way I'd really really like to avoid extending object URLs if at all possible, since for the vast majority of use cases they are a terrible hack.

But okay, the three proposals mentioned here:

So since what you're describing is very different in pretty much every way from existing object URLs does it really make sense to even have this in the same API/scheme? Would it possibly make more sense to define a new scheme for this, have API on Response (or Body, but that wouldn't have access to the headers) that generates one of these new type of URLs. And for this new type of URL we could then do the: auto-revoke on first fetch (and don't resolve until they are fetched, which should be fine since we wouldn't support manual revoke), and limit scope to the realm/document that created them?

That seems like it would address your usecases, without making object URLs more powerfull/complicated than they already are, and without having to support the explosion of possible configurations that introducing multiple separate flags would cause.

bmeck commented 6 years ago

@mkruisselbrink

Revoking the URL on use. This one is very tricky too. How do you define use? The specs as currently written resolve blob URLs when the URL is parsed. Would it be that parsing that revokes the URL? Or if not, and it is the fetching that revokes the URL, how do you deal with other fetches that already resolved the URL but haven't fetched it yet. Should those start to fail? But that would be inconsistent with explicit revocation which doesn't have that behavior. So this also seems like it's given these URLs very different semantics than existing object URLs.

Auto revocation is in the spec already and is left up to the host to define: https://github.com/w3c/FileAPI/blob/3b94136ef959a191f8953d435280008cc7ea8d76/TR.html#L1761

Limit the scope to a single realm/document: seems sort of like a reasonable thing to have, but again very different semantics than existing object URLs (although to be fair, this is how chrome implements object URLs for media sources).

This doesn't seem to be a + or -?

So since what you're describing is very different in pretty much every way from existing object URLs does it really make sense to even have this in the same API/scheme?

IDK, but this seemed like the place to do it rather than try to make an entire new spec. The object URLs already have somewhat unstable lifetimes and using them after revocation can cause odd effects, this doesn't seem to add too much ontop of that in my opinion.

Would it possibly make more sense to define a new scheme for this, have API on Response (or Body, but that wouldn't have access to the headers) that generates one of these new type of URLs. And for this new type of URL we could then do the: auto-revoke on first fetch (and don't resolve until they are fetched, which should be fine since we wouldn't support manual revoke), and limit scope to the realm/document that created them?

What is the significant difference with this and extending the existing blob: scheme? This seems to just be wanting to make a new scheme with different defaults? I'm fine keeping the defaults of the existing scheme.

That seems like it would address your usecases, without making object URLs more powerfull/complicated than they already are, and without having to support the explosion of possible configurations that introducing multiple separate flags would cause.

Wouldn't this just move the explosion to this new scheme? I'm not sure the idea of making a new scheme reduces overall complexity in any way.

mkruisselbrink commented 6 years ago

Auto revocation is in the spec already and is left up to the host to define:

Auto revocation was in the spec, was never implemented as spec'ed, because it was never very well defined. Please refer to the latest version of the spec rather than reading old TR versions, as those are pretty much universally meaningless.

Wouldn't this just move the explosion to this new scheme? I'm not sure the idea of making a new scheme reduces overall complexity in any way.

Not exactly. Specifying (and implementing) auto-revoking in a way that works with URLs that are limited to a single realm, and have different revocation logic than existing URLs is much easier than trying to somehow do the same with the existing blob URLs. So this way we don't support three different flags that could all be turned on or off, but exactly two options: Either we have the current behavior with createObjectURL, or we have this completely different behavior with a completely different API, and hot have to worry about how it interacts across realms, how it interacts with explicit revocation, how it interacts with the early resolution blob URLs need because of explicit revocation etc.

So since the described behavior is different from the blob: behavior in pretty much every way, it seems to me to make a lot more sense to not try to paper over these differences and instead just keep them completely separate. That way there is no confusion as to why certain blob: URLs behave in one way why others behave completely different.

bmeck commented 6 years ago

@mkruisselbrink I have use cases that are not using some of these flags so that throws a problem up with the idea that we don't need to have flags. In particular I want to create URLs off the main thread but let the main thread load them. These are both cross realm and may not be revoked on use.

mkruisselbrink commented 6 years ago

So lets figure out what use cases there are that aren't currently supported before we jump to trying to design solutions?

bmeck commented 6 years ago

@mkruisselbrink I tried to bring this up in previous discussions in various places including a few on the previous issue and feel like I have made 0 progress in over 6 months. I would request you tell me what to do if you want me to do something in particular.

mkruisselbrink commented 6 years ago

I understand you have some use case where you for some reason need to be able to create self-referential blob URLs (or at least mutual self-referential pairs of blob URLs). I'm not sure what the motivating use case for that requirement is.

And from you're earlier comments in this issue "it seems like this might need to also have some flags setup to make 2 conditions available:" it sounded like for some reason it turned out that the request in this issue required those two other options as well, so from that it seemed like it might make sense to just treat them all as one big option. But maybe I misunderstood what you were trying to say in that comment.

But in either case, I don't see here or in the other issue what the use case is where you actually need these mutually recursive blob URLs. So without compelling use cases I'd be hesitant to massively complicate an already hard-to-understand part of the spec.

bmeck commented 6 years ago

@mkruisselbrink my use case is generally the creation of ESM in memory, I have mentioned that several times and even spent quite a bit of effort trying to see how to make ESM able to be instrumented in Service Workers, HTTP Servers, and seeing if I could get any hooks into the HTML specification. In general, all of this effort results in similar to the comment you had above about "what is the use case". The use cases are many but in general can be summarized as having a way to generate any non-trivial transformation of either specifier location or response body for ESM records. My annoyance is mostly coming from that not being seen as valid per your phrasing:

But in either case, I don't see here or in the other issue what the use case is where you actually need these mutually recursive blob URLs.

Which for this issue it isn't purely about mutually recursive URLs, unlike the previous one.

mkruisselbrink commented 6 years ago

Okay, so reading some of those linked issues etc, what I understand is that the use case is the desire to work around some limitations in how ES modules are resolved/loaded, and since the discussion around how to improve those limitations on the loader/html side have stalled, you came up with some way to work around the limitations on the client side, but doing that requires changes like the ones proposed in this issues? Or in other words, if the loader/html side issues get resolved somehow, would there still be a use case for this?

bmeck commented 6 years ago

@mkruisselbrink even if the loader/html hook gets added you would still need this to create a few scenarios. Namely any sort of source code transformation cannot be fully solved by the request data hook being added. So things like instrumenting code to have coverage/do local code transformations would not work unless you have a URL it can resolve to. Lets assume we have a transform of some kind that is attempting to allow import of JSON by creating one of these Synthetic ESM. This would turn JSON responses ~=

import json from './foo.json';

To redirect to some newly created ESM record:

// foo.json
export default JSON.parse(...)

We have a few things going on here.

  1. fetching ./foo.json i. this Response could be to a failure / not ok
  2. Creating a URL for the ESM facade so that it can be imported
  3. Populating that ESM facade URL
  4. Redirecting ./foo.json specifier to a new ESM Response URL

The problems I am currently facing are:

Only one aspect of this would be solved by that request data hook being added to the HTML spec.

For more complex problems like source code transformation for JSX / Babel / code coverage / etc. doing transformations off the main UI thread is also ideal. However, doing this on the UI thread is possible.

I also am coming at this from a Node.js implementer perspective and am seeking to create a parity between the two platforms while we work on our hooks for ESM. Node does not currently have a blob: scheme in our loader, but we are trying to figure out the best way to create a sane semi-consistent workflow across the two environments. If that requires a new API/scheme, that seems doable but I remain hesitant to think that a new scheme is going to be any less leaky.

mkruisselbrink commented 6 years ago

I'm not sure why just having a service worker wouldn't work for almost all of those cases? I understood the main issue with a service worker was the run-on-first-load case. But that shouldn't really matter for things like code coverage etc (and might be addressed by other issues anyway).

I cannot rely on the life of blob: URLs from Service Workers / browsers don't let me use URL.createObjectURL in SW. This seems to be a combination of browser implementations not implementing this part of spec,

Not sure what part of the spec you think implementation aren't implementing? The spec explicitly does not allow createObjectURL in SW, since it would be both useless (the URL would expire when the SW gets killed), and not provide much benefit (you can invent your own custom URLs that you can resolve to whatever you want in your service worker).

bmeck commented 6 years ago

@mkruisselbrink

I'm not sure why just having a service worker wouldn't work for almost all of those cases. I understood the main issue with a service worker was the run-on-first-load case

There is also missing data, hence why https://bmeck.github.io/node-sw-compat-loader-test/dist/test.js is changing the specifiers from simple things to extra network requests to add more data that is not available in a fetch event. Service workers also don't let you rewrite the contents imports that live in data: or blob: schemes so you need the HTML change to get a hook to handle those.

But that shouldn't really matter for things like code coverage etc (and might be addressed by other issues anyway).

Then lets ignore that case and just talk about code transforms and facades for now.

you can invent your own custom URLs that you can resolve to whatever you want in your service worker

You can but you get into bad situations quickly, hence why https://github.com/bmeck/esm-http-server has some prefixing going on. Though that starts to leak the mutation into the client and requires the client manually do this for any entry point not coming from inside the SW (through things like people using HTML tags to load ESM).

mkruisselbrink commented 6 years ago

Then lets ignore that case and just talk about code transforms and facades for now.

But that gets us back to use cases. Code transforms and facades are not use cases. They might serve to address other use cases, but without knowing those use cases it is impossible to figure out if they are the correct solution for the use case at hand.

bmeck commented 6 years ago

@mkruisselbrink what defines a use case then? You are putting basically all of transforms like JSX / transpilers / loading new content-types using facades as non-use cases.

guest271314 commented 6 years ago

Not gathering what the issue is or what Response in URL.createObjectURL means. What are you trying to achieve?

guest271314 commented 6 years ago

We have a few things going on here.

  1. fetching ./foo.json i. this Response could be to a failure / not ok
  2. Creating a URL for the ESM facade so that it can be imported
  3. Populating that ESM facade URL
  4. Redirecting ./foo.json specifier to a new ESM Response URL

If interpret the issue and use case correctly are you trying to export and import a Blob URL?

bmeck commented 6 years ago

@guest271314

Not gathering what the issue is or what Response in URL.createObjectURL means. What are you trying to achieve?

I am trying to implement code transformation for a variety of use cases in the browser. Adding Response as a backing entry for blob: URLs would expand the capabilities of blob: URLs to include a variety of new ways, but the expansion itself is not the main goal.

If interpret the issue and use case correctly are you trying to export and import a Blob URL?

I am importing a blob: URL in order to work around limitations of preprocessing steps and service workers in order to get the code transformation use case satisfied. I am not tied specifically to using blob:, but this use case of transforming ESM loading beyond service worker capabilities seems to not be seen as a use case by others.

I think there is opposition in the comments above about what a use case is and I am frustrated because there is no clear place to move forwards, so I will just have to keep opening issues in various places to see if the use case is valid given the context of where I open feature requests.

guest271314 commented 6 years ago

What is "code transformation"?

Have you tried using a data URL, which are served with Content-Type header, at <script type="module">?

E.g., by hand

<script type="module">
  import {Test} from "data:application/javascript,const%20Mod={this.abc=123};export%20{Mod};";
  console.log(Test);
</script>
guest271314 commented 6 years ago

What is the actual code that you are trying to implement?

bmeck commented 6 years ago

@guest271314 data: is not well suited since if I have 2 modules with the same source text it would collide.

I am roughly trying to implement the resolve() hook that Node has for ESM in the browser using whatever works. If it is not possible to implement such compatibility we may end up changing the hook in Node to be more minimal, as it seems adding an equivalent platform level hook has stalled.

guest271314 commented 6 years ago

Not following

2 modules with the same source text it would collide.

nor what the actual use case is.

Do you mean something like

(async() => {
  const currentScript = document.currentScript;

  const requestModule = async({
    url, dataURL = true
  }) => {
    let request, blob, reader, response;
    try {
      request = await fetch(url);
      blob = await request.blob();
      console.log(blob, dataURL);
      reader = new FileReader();
      response = new Promise((resolve, reject) => {
        reader.addEventListener("loadend", () => resolve(reader.result));
      });

      reader[dataURL ? "readAsDataURL" : "readAsText"](blob);
    } catch (err) {
      console.error(err);
    }

    return response;
  }

  let moduleNames = `abc, def`;
  // get module
  let moduleRequest = await requestModule({
    url: "exports.js",
    dataURL: true
  });
  // do stuff with `abc, def`; e.g., `console.log(abc, def)`
  let moduleBody = await requestModule({
    url: "exportsBody.js",
    dataURL: false
  });

  let scriptModule = `import {${moduleNames}} from "${moduleRequest}"; ${moduleBody}`;
  let script = document.createElement("script");
  script.type = "module";
  script.textContent = scriptModule;

  currentScript.insertAdjacentElement("afterend", script);

})();
guest271314 commented 6 years ago

Note, the above pattern was primary composed to test import and export at file: protocol.

bmeck commented 6 years ago

@guest271314

Not following

2 modules with the same source text it would collide.

data: does not make new records whenever you import them, so you get into a strange situation where things get shared across potentially unrelated graphs:

<script type="module">
  import data from "data:application/javascript,export%20default%20{};";
  // prints 1
  setTimeout(() => console.log(data.x), 1000);
</script>
<script type="module">
  import data from "data:application/javascript,export%20default%20{};";
  data.x = 1;
</script>

So you have to use some different scheme for generic transforms that might generate the same source text.


Your code does exhibit the use case to some degree, but doesn't have the ability to intercept specifiers and referrers like https://github.com/bmeck/esm-http-server/blob/public/instrument.js is doing. Also, transformation of the moduleBody is unnecessary when you can redirect to a newly generated URL like blob: allows to the transformed moduleBody.

guest271314 commented 6 years ago

Have not tried babel. Have minimal experience using ServiceWorker. Am not familiar with what "transform", as-applied, means. The code at previous comment is closest can fathom, so far, as to what you are trying to achieve.

guest271314 commented 6 years ago

What does

data: does not make new records whenever you import them, so you get into a strange situation where things get shared across potentially unrelated graphs:

<script type="module">
  import data from "data:application/javascript,export%20default%20{};";
  // prints 1
  setTimeout(() => console.log(data.x), 1000);
</script>
<script type="module">
  import data from "data:application/javascript,export%20default%20{};";
  data.x = 1;
</script>

intend to demonstrate?

Note that data.x is a property of data, not data itself.

guest271314 commented 6 years ago

exports.js

export default {abc:123}

JavaScript

 <script type="module">
    import data from "./exports.js"; 
    setTimeout(() => console.log(data), 1000);
  </script>
  <script type="module">
    import data from "./exports.js"; 
    data.abc = 456;
  </script>

provides the same results as using a data URL.

guest271314 commented 6 years ago

Assigning a value to a property of a const variable is not the same as attempting to assign a different value to the const variable itself.