w3c / FileAPI

File API
https://w3c.github.io/FileAPI/
Other
105 stars 44 forks source link

Don't update FileReader's result until "LOAD" #79

Closed annevk closed 5 years ago

annevk commented 7 years ago

Otherwise you get a giant memory leak. Reportedly Firefox already does this. Chrome allocates strings/buffers and will quickly get slow.

jakearchibald commented 7 years ago

Chrome bug https://bugs.chromium.org/p/chromium/issues/detail?id=674903

jakearchibald commented 7 years ago

We could provide incremental updates for arraybuffers by allocating new ArrayBuffer(file.size) as soon as readAsArrayBuffer is called, then update the buffer for each progress event. This would avoid the performance issues Chrome is hitting.

Unfortunately it means that reader.result.byteLength will not reflect progress, which may be considered a backwards incompatible change.

jakearchibald commented 7 years ago

However, XHR doesn't offer an arraybuffer until the request is complete, so aligning with that might be better. Then, create a streams-based API to do it properly.

Again, this may be backwards incompatible.

mkruisselbrink commented 7 years ago

@jakearchibald that chrome issue was actually for a completely different bug (until you added comments about this issue to it). I do agree that it might make sense to get rid of result during progress events if that is what other browsers are already doing anyway. As you mention, should be relatively easy to add a use-counter for.

mkruisselbrink commented 7 years ago

Actually, the spec already sort of implies that result shouldn't be available until LOAD time. All the various read method algorithms don't assign any value to result until the "process read EOF" steps. It's probably still a good idea to write some tests to figure out what browsers are actually doing here though (for example does result keep the result of a previous read operation until a new load has finished, or is it reset back to null when the new read starts?)

And I added the use counter to chrome to see how often this (non-spec-compliant) behavior of returning result before load is finished is actually triggered.

mkruisselbrink commented 7 years ago

Chrome bug: https://bugs.chromium.org/p/chromium/issues/detail?id=768972

mkruisselbrink commented 7 years ago

And some tests to figure out the current behavior in https://github.com/w3c/web-platform-tests/pull/7494

In summary, currently Firefox seems to be the odd one out, with Chrome, Safari and Edge all having result available during progress events (at least during the last progress event; it's hard to come up with a cross-browser test that triggers multiple progress events). Firefox on the other hand seems to more closely follow the spec, with two exceptions: 1) when starting a new read it resets result to null, and 2) for readAsBinaryString Firefox actually does match the other three browsers and the result is available during the progress event. It's only for the other three read methods were it doesn't.

So maybe changing the spec to match Chrome/Safari/Edge rather than trying to change almost all the implementations to match the spec would make more sense...

(Edge does actually have slightly odd behavior: reader.result actually already contains the full result even during the loadstart event...)

annevk commented 7 years ago

How does https://w3c.github.io/FileAPI/#filedata-attr not return anything before LOAD? That doesn't match up with the requirements.

Also, what about the memory leak? I hope you don't want to enshrine that.

mkruisselbrink commented 7 years ago

It's definitely ambiguous in the current spec what result should return after calling a method but before the algorithm in the method sets result to something, so no matter what solution we pick the spec should be clarified.

I'm not sure what the supposed memory leak is though?

annevk commented 7 years ago

Allocating new strings or ArrayBuffer objects while loading, each slightly larger than the last.

mkruisselbrink commented 7 years ago

Well yeah, that could be bad, but it still wouldn't be any kind of leak unless the javascript code explicitly holds on to the created objects. And even then it's just bad javascript code, not a leak... Also (at least chrome) won't actually allocate these partial strings/arraybuffers unless javascript actually tries to access the result attribute (although I guess that might be observable somehow?)

But yeah, it definitely seems of limited usefulness to be able to inspect result before reading is complete.

annevk commented 7 years ago

Even if you don't hold onto the created objects you still end up allocating lots of ever increasing objects only to let them GC again shortly after. It's a bad design.

mkruisselbrink commented 7 years ago

Yeah, there definitely is little benefit in having result be usable during loading. So if it wasn't for the fact that most browsers/methods currently do so I would have no problem changing the spec and chrome implementation (well, pending outcome of the use counter I added). I'm just not sure if it's worth changing this given that most browsers actually agree on a behavior.

mkruisselbrink commented 6 years ago

(did some more digging into this), the chrome implementation at least seems to match the spec as it was when it was implemented (in 2010) until around 2013. https://www.w3.org/TR/2012/WD-FileAPI-20121025/#filedata-attr still explicitly says " the result attribute should return partial Blob data " for readAsText and readAsArrayBuffer (but not for readAsDataURL), which exactly matches chromes behavior. The next version of the spec no longer has that language and instead queues tasks at load complete to update the result attributes once and only once.

Also this subject was brought up in http://lists.w3.org/Archives/Public/public-webapps/2013JanMar/0289.html (as point 4), but it doesn't seem like anybody ever replied to that point.

Ah, later that year another thread was started specifically about this subject, http://lists.w3.org/Archives/Public/public-webapps/2013JulSep/0357.html, which doesn't seem to have gotten very far either, but it seems like the spec was updated to reflect that change anyway.

So yeah, it would make sense to make the spec a bit more explicit here, even though it already sort of implies that only final result should be available. Fortunately it doesn't seem like reading partial results is used a lot (https://www.chromestatus.com/metrics/feature/timeline/popularity/2158), so seems like this should be fairly safe to change.

jakearchibald commented 6 years ago

@mkruisselbrink good investigative work sir!