Web initiated downloads not possible to test for

cbentzel commented 6 years ago

Downloads initiated from web content are not possible to detect or test for in WebPlatformTests.

These could be downloads triggered in a variety of ways:

Click event on an element

Navigation of a frame to a non-web-renderable MIME type resource

Navigation of a frame to a resource with a Content-Disposition header

There are not APIs for this available on the web platform. It also looks like currently there is not WebDriver support for detecting if a download has happened. Attempts which look at local filesystem only would not work for remote testing.

The likely solution therefore is to add support for this case in WebDriver, and then add wpt support on top of that.

guest271314 commented 6 years ago

One approach would be to create a unique identifier to set as the filename of the file offered for download. Following download of offered file prompt user to upload file which was downloaded to verify that the downloaded filename matches the unique identifier set as filename.

gsnedders commented 6 years ago

Following download of offered file prompt user to upload file which was downloaded to verify that the downloaded filename matches the unique identifier set as filename.

We don't have any way to automate the file upload prompt either, though. :)

guest271314 commented 6 years ago

@gsnedders

We do not need to automate the file upload prompt. There would need to be user complicity within the testing process.

Given that user is, or should be aware that file should be downloaded before next step in process, user should expect some form of confirmation that file has been downloaded to occur.

You can create a unique identifier or timestamp to include within downloaded file name by utilizing <a> element with download attribute set to a the modified file name.

At click event of <button> element call .click() on <a> element with href set to a Blob URL of file. At a element click handler call .click() on an <input type="file"> element, where at attached change event user should select same file which was downloaded at the user action which started download of file.

Note the chaining of calls to .click() beginning with user action.

If the file selected from user filesystem is equal to modified downloaded file name, call function, else notify user that file download has not been confirmed.

This is the closest have achieved so far to meet requirement of unique file download verification

<!DOCTYPE html>
<html>

<head>
  <script>
    window.addEventListener("load", function() {

  let id, filename, url, file; 
  let confirmed = false;
  const a = document.querySelector("a");
  const button = document.querySelector("button");
  const confirm = document.querySelector("input[type=file]");
  const label = document.querySelector("label");

  function confirmDownload(filename) {
    if (confirmed) {
      filename = filename.replace(/(-\d+)/, "");
      label.innerHTML = "download of " + filename + " confirmed";
    } else {
      confirmed = false;
      label.innerHTML = "download not confirmed";
    }
    URL.revokeObjectURL(url);
    id = url = filename = void 0;
    if (!file.isClosed) {
      file.close()
    }
  }

  function handleAnchor(event) {
    confirm.click();
    label.innerHTML = "";
    confirm.value = "";
    window.addEventListener("focus", handleCancelledDownloadConfirmation);
  }

  function handleFile(event) {
    if (confirm.files.length && confirm.files[0].name === filename) {
      confirmed = true;      
    } else {
      confirmed = false;
    }
    confirmDownload(filename);
  }

  function handleDownload(event) {
    // file
    file = new File(["abc"], "file.txt", {
      type: "text/plain",
      lastModified: new Date().getTime()
    });
    id = new Date().getTime();
    filename = file.name.match(/[^.]+/g);
    filename = filename.slice(0, filename.length - 1).join("")
               .concat("-", id, ".", filename[filename.length - 1]);
    file = new File([file], filename, {
      type: file.type,
      lastModified: id
    });
    a.download = filename;
    url = URL.createObjectURL(file);
    a.href = url;
    alert("confirm download after saving file");
    a.click();
  }

  function handleCancelledDownloadConfirmation(event) {
    if (confirmed === false && !confirm.files.length) {
      confirmDownload(filename);
    }
    window.removeEventListener("focus", handleCancelledDownloadConfirmation);
  }

  a.addEventListener("click", handleAnchor);

  confirm.addEventListener("change", handleFile);

  button.addEventListener("click", handleDownload);

});
  </script>
</head>

<body>
  <button>download file</button>
  <a hidden>download file</a>
  <input type="file" hidden/><label></label>
</body>

</html>

plnkr

foolip commented 6 years ago

@cbentzel, do we have any tests for this in Chromium? Are they LayoutTests using internals.*, browser tests, unit tests, or something else?

What would a suitable API be? I think that files won't always be downloaded without user interaction, so perhaps an API to check if any downloads are "pending", and a way to allow the download and get some kind of handle to the result? Or would just the first part be enough?

Trying to decide how to prioritize this, how many tests would be affected by this? I guess everything around <a download> and the Content-Disposition header. Which were you trying to test yourself?

foolip commented 6 years ago

Hmm, I see that we have LayoutTests/http/tests/security/anchor-download-allow-blob.html that does testRunner.waitUntilExternalURLLoad(), which is also used by the <video controls> download button tests.

I've tentatively called this the backlog priority since it look like a small number of tests and a fair bit of work to get it done. But if it is blocking important interop testing, especially of new features that could be made more interoperable out of the gate with good testing, please shout.

anforowicz commented 5 years ago

A slightly different (but probably not different enough to warrant a separate bug) scenario is triggering a download through browser UI (e.g. right click -> Save Link As...) - in this case the resource request should be browser initiated and therefore should be 1) treated as same-origin by Cross-Origin-Resource-Policy and SameSite cookies and 2) should specify Sec-Fetch-Site: none (see here). This scenario is motivated by https://crbug.com/952834#c13.

guest271314 commented 5 years ago

@anforowicz Not gathering what the issue is at the linked Chromium bug? How to reproduce without a server? For example, at plnkr? What is the JavaScript and HTML used?

anforowicz commented 5 years ago

@guest271314

Not gathering what the issue is at the linked Chromium bug?

Does my comment in https://bugs.chromium.org/p/chromium/issues/detail?id=952834#c30 help?

How to reproduce without a server?

I think this is not possible - the problem is closely related to the fact that the response from the http server includes a Cross-Origin-Resource-Policy: same-origin response header.

For example, at plnkr?

Does plnkr (and/or jsfiddle / jsbin / etc.) allow specifying http response headers.

What is the JavaScript and HTML used?

HTML: <a href="resource.txt">link</a>. No JavaScript.

guest271314 commented 5 years ago

@anforowicz Is there an example URL in the wild that was tested? Does the <a> element at HTML have a download attribute?

anforowicz commented 5 years ago

@guest271314

Is there an example URL in the wild that was tested?

No.

Does the element at HTML have a download attribute?

No.

guest271314 commented 5 years ago

@anforowicz How was this tested if not in the wild?

anforowicz commented 5 years ago

@guest271314

How was this tested if not in the wild?

Why does an answer to this question matter for the WPT framework issue at hand?

The bottomline is that WPT framework doesn't support testing this kind of scenario today and the issue we are discussing here (https://github.com/web-platform-tests/wpt/issues/8790) tracks adding this kind of support to the WPT framework, right?

guest271314 commented 5 years ago

@anforowicz If gather your comment at https://github.com/web-platform-tests/wpt/issues/8790#issuecomment-484246813 correctly from your perspective a resource served with header Cross-Origin-Resource-Policy: same-origin should be downloadable?

Am still trying to determine what the Chromium bug report is actually describing and how that bug is related to this issue?

In general, it is not possible to observe or verify if a download has occurred client-side programmatically.

With user complicity, that is, for example, a manual tes it is possible to test the download process from several different technical perspectives, see Detect when user accepts to download a file https://stackoverflow.com/questions/41334881/detect-when-user-accepts-to-download-a-file.

guest271314 commented 5 years ago

@anforowicz See the code at https://github.com/web-platform-tests/wpt/issues/8790#issuecomment-354032643. Essentially the test would create a unique file name for a File object (lastModified attribute value or other means of creating a unique value can be used), which can be programmatically set in JavaScript (at <input type="file"> element, if necessary, see How to set File objects and length property at FileList object where the files are also reflected at FormData object? https://stackoverflow.com/q/47119426), user downloads file then immediately uploads file to check if the filename is the same as the downloaded file. That can be composed as a manual wpt test. If the goal is to create an automated test, that might be possible as well by adjusting the linked proof of concept code.

guest271314 commented 5 years ago

@anforowicz The immediacy of the download <-> upload procedure is imperative, that is, use of the lastModified attribute of a File instance/object, as a File object name can be changed using FormData.set() (see https://github.com/w3c/FileAPI/issues/126#issuecomment-480971351), though the lastModified property value will remain the same (https://github.com/w3c/FileAPI/issues/126#issuecomment-480976057).

httpbin appears to provide a means to set the response headers described

<a href="https://httpbin.org/response-headers?cross-origin-resource-policy=same-site" download="resource.txt">click</a> though still not clear what the expected result is? If not download, navigation to the URL?

anforowicz commented 5 years ago

@guest271314

If gather your comment at #8790 (comment) correctly from your perspective a resource served with header Cross-Origin-Resource-Policy: same-origin should be downloadable?

Correct - a resource served with Cross-Origin-Resource-Policy: same-origin should be downloadable.

Am still trying to determine what the Chromium bug report is actually describing

Can you clarify which part of https://crbug.com/952834#c30 is unclear?

and how that bug is related to this issue?

Ideally the regression test for https://crbug.com/952834 should be part of WPT. Currently this is not possible. Having a regression test within WPT is desirable so that all browsers can benefit from the test (right now I have to author a Chrome-specific test instead).

In general, it is not possible to observe or verify if a download has occurred client-side programmatically.

I agree that it is not possible nor desirable to observe downloads from web pages. OTOH, I think that ability to verify downloads is a desirable feature of WPT. AFAIU https://github.com/web-platform-tests/wpt/issues/8790 issue tracks adding such feature to WPT (and/or WebDriver).

If the goal is to create an automated test

Indeed, creating an automated WPT test is my goal. I want to be able to author a test that 1) triggers a download via Save Link As context menu and 2) verifies that the body of the saved/local file is correct.

that might be possible as well by adjusting the linked proof of concept code.

I may be missing something, but I fail to see how the code in https://github.com/web-platform-tests/wpt/issues/8790#issuecomment-354032643 would help with verifying that a http resource was downloaded - the code in the other comment:

is not reading/verifying contents of a file in a local ~/Downloads directory
is not downloading a http resource - it is instead downloading a File object

guest271314 commented 5 years ago

@guest271314

If gather your comment at #8790 (comment) correctly from your perspective a resource served with header Cross-Origin-Resource-Policy: same-origin should be downloadable?

Correct - a resource served with Cross-Origin-Resource-Policy: same-origin should be downloadable.

If read the Fetch specification correctly https://fetch.spec.whatwg.org/ the resource should NOT be downloadable?

Am still trying to determine what the Chromium bug report is actually describing

Can you clarify which part of https://crbug.com/952834#c30 is unclear?

Which specification states that a resource served with Cross-Origin-Resource-Policy: same-origin should be downloadable? If a specification states that, why does the HTML above and at the linked Chromium bug not have a download attribute? The linked Chromium bug appears to attempt to download a PDF file embedded in Chromium browser version of PDF viewer, not a <a> element with download attribute set.

There are other examples of resources not being capable of being downloaded, e.g., MediaSource specification essentially states that a Blob URL as described in File API should perform the same as a Blob URL that is not based on a MediaSource, though that is not the case. A Blob URL which is based on a MediaSource cannot be downloaded either, made explicitly clear by a Chromium implementation that removes the ability to right-click and select "Save as.." from a context menu.

and how that bug is related to this issue?

Ideally the regression test for https://crbug.com/952834 should be part of WPT. Currently this is not possible. Having a regression test within WPT is desirable so that all browsers can benefit from the test (right now I have to author a Chrome-specific test instead).

Not gathering the benefit of such a test to be performed in an automated fashion.

In general, it is not possible to observe or verify if a download has occurred client-side programmatically.

I agree that it is not possible nor desirable to observe downloads from web pages. OTOH, I think that ability to verify downloads is a desirable feature of WPT. AFAIU #8790 issue tracks adding such feature to WPT (and/or WebDriver).

If the goal is to create an automated test

Indeed, creating an automated WPT test is my goal. I want to be able to author a test that 1) triggers a download via Save Link As context menu and 2) verifies that the body of the saved/local file is correct.

Even where a right-click and "Save as.." presents a prompt offering a resource to be downloaded, the user does not have to save the file; can leave the prompt open for an indeterminate period of time without performing any user action; can change the name of the file; etc.

Also, the contextmenu event does not necessarily specify which command the user selects to run.

It is not clear what you mean by "verifies that the body of the saved/local file is correct". "correct" in what measurable ways?

If a user downloads a file without reading the contents of the file and storing the contents of the file first, what is verifiable or verified? Again, user complicity in the procedure is a prerequisite.

that might be possible as well by adjusting the linked proof of concept code.

I may be missing something, but I fail to see how the code in #8790 (comment) would help with verifying that a http resource was downloaded - the code in the other comment:

is not reading/verifying contents of a file in a local ~/Downloads directory

Why do you want to read the contents of the file? A user can select any local directory to download a file to, not only ~/Downloads.

is not downloading a http resource - it is instead downloading a File object

An HTTP resource can be set as the value of a File object.

guest271314 commented 5 years ago

@anforowicz

OTOH, I think that ability to verify downloads is a desirable feature of WPT.

2) verifies that the body of the saved/local file is correct

verifying that a http resource was downloaded

is not reading/verifying contents of a file in a local ~/Downloads directory

How do you propose to achieve the requirements? An API for read/write access to user local filesystem (https://github.com/WICG/native-file-system)?

Can you clearly define "verify" within the scope of the prospective wpt tests?

anforowicz commented 5 years ago

@guest271314

@anforowicz

OTOH, I think that ability to verify downloads is a desirable feature of WPT.

verifies that the body of the saved/local file is correct verifying that a http resource was downloaded

is not reading/verifying contents of a file in a local ~/Downloads directory

How do you propose to achieve the requirements?

WebDriver probably.

An API for read/write access to user local filesystem (https://github.com/WICG/native-file-system)?

No, as I said before, I think that it is not desirable to observe downloads from web pages.

Can you clearly define "verify" within the scope of the prospective wpt tests?

Verify that the contents of the local file (i.e. the file resulting from the download) match test expectations. For example - compare that the contents of the file match a given string.

guest271314 commented 5 years ago

@anforowicz

WebDriver probably.

The question about the term "verify" or "verification" was not asking which code would be used to verify, but rather, what is the criteria for verification; that is === comparing two strings? How can any individual conclusively determine the difference, if any, between two or more files having the same metadata and content data?

A rudimentary template for testing download of a file. Some considerations include the original file set for download being mutated after the prompt to download a file; one or more shell scripts (potentially having a direct messaging capabilities with JavaScript via Native Messsaging or similar functionality) which actively scan directories for "new" files and change the file in some manner that might not be immediately apparent, depending on the methods used for reading the file; shell scripts when change the file in some way that is apparent though renders any "verification" not possible due to setting the permission of the downloaded file to "read-only" immediately upon being written to disk; unless the file is transferred the downloaded file will be a copy of the initial file stored in memory at the browser, therefore the comparison would be the likeness of one or more copies of a file; etc., et al.

<!DOCTYPE html>
<html>

<head>
  <title>Test file download at browser - Manual</title>
</head>

<body>
  <p id="fileContents" contentEditable="">File contents</p>
  <input id="fileInput" type="file" accepts="text/plain">
  <a id="fileDownload" href="" download="">download</a>

  <script>
    (async() => {
      // it should take less than 10 seconds to download a file and upload the same file
      // which should theoretically provide _some_ measure of integrity as to the
      // tests being intercepted; e.g., a native application which _changes_ files in 
      // a directory selected for download target by user action; though not impossible
      // to intercept using native shell scripts and/or Native Messaging with native shell scripts
      const fileContents = document.getElementById("fileContents");
      const fileInput = document.getElementById("fileInput");
      const fileDownload = document.getElementById("fileDownload");
      const fileName = "resource";
      const lastModified = Date.now();
      const fileData = fileContents.textContent;
      const reader = new FileReader();
      // use `data URL` representation of a file 
      const dataURL = await new Promise(async resolve => {
        reader.addEventListener("load", ({target:{result}}) => {
          resolve(result);
        });
        reader.readAsDataURL(new Blob([fileData], {type: "text/plain"}));
      });

      // get MIME type of data URL
      // using a Blob URL has issue of user clicking <a>,
      // prompt to download file is displayed for an indeterminate time
      // which changes duration between `lastModified` date of a
      // programmatically created `File` and the `File` object instance 
      // at `change` event with respect to setting or using the default 
      // `lastModified` property of a `File` object instance for comparison
      const fileDownloadType = dataURL.replace(/^.+[:]|;.+$/g, "");

      fileDownload.href = dataURL;

      fileDownload.download = fileName + "_" + lastModified + ".txt";

      fileDownload.addEventListener("click", e => {
        const downloadTime = Date.now();
        console.log("download clicked", dataURL);
        fileInput.addEventListener("change", async e => {
          const uploadTime = Date.now();
          // the below assertions SHOUL FAIL, which means the tests PASS: expected result
          const uploadedFile = await new Response(e.target.files[0]).text();
          console.assert(e.target.files[0].type !== fileDownloadType, {uplaodedFileType: e.target.files[0].type, fileDownloadType, assertionShouldFail: true});
          console.assert(uploadedFile !== fileData, {uploadedFile, fileData,  assertionShouldFail: true});
          console.assert(e.target.files[0].name !== fileDownload.download, {uploadedFileName: e.target.files[0].name, downloadedFileName: fileDownload.download, assertionShouldFail: true});
          // it should be possible to download and upload a file within 7 seconds; depends on
          // file size, available RAM and disk space of the machine used; varies widely
          console.assert((uploadTime - downloadTime) > 10000, {uploadTime, downloadTime, uploadTimeMinusDownloadTime: uploadTime - downloadTime, assertionShouldFail: true});
        }, {once: true});
      }, {once: true});
    })();
  </script>
</body>
</html>

plnkr https://plnkr.co/edit/GuSERd?p=preview

guest271314 commented 5 years ago

@anforowicz As to relying on an automated tool to perform this type of test, it is nearly impossible to conceive of and program to a machine the actual actions of a user within the context of downloading a file, as the variations of machines, RAM, disk space, OS, etc. varies widely. Testing with an automated coding tool involves the tool, not the user; thus can provide false positives as to the actual behaviour of users within the context. Similar to testing if the volume of an audio sample perceptibly changes given different values; the analysis of the audio output itself is sampled; similarly with colors (pixels) rendered for display at the browser. Would suggest human user is best suited to perform file download tests manully.

anforowicz commented 5 years ago

@guest271314

it is nearly impossible to conceive of and program to a machine the actual actions of a user within the context of downloading a file

DownloadTest in Chrome seems to work just fine, so it seems reasonable to me that similar test support should be possible in WPT.

guest271314 commented 5 years ago

@anforowicz Drag and drop type downloads would also need to be tested.

guest271314 commented 5 years ago

@anforowicz Is FilePath the actual path component https://w3c.github.io/html/sec-forms.html#path-components of a user directory or a temporary location in memory created for the procedure https://cs.chromium.org/chromium/src/content/browser/download/download_browsertest.cc?rcl=fda44632071bd2c4feb54afc3004e5166b6eb4e9&l=2629 ? The path component of a user filesystem location should not be exposed to JavaScript, unless there is a bug in the API.

guest271314 commented 5 years ago

@anforowicz FWIW this is an example of a similar concept, that is, checking that each byte of an uploaded file is written to server, which appears to be, at least in part what the requirement is How to read and echo file size of uploaded file being written at server in real time without blocking at both server and client? https://stackoverflow.com/q/42475492. If the Chromium test does achieve the expected test results it should be possible to utilize WebAssembly to port the code to JavaScript, though would still have the issue of granting read access to a user directory. Chromium/Chrome app chrome.fileSystem provides that functionality JavaScript/Ajax Write to File https://stackoverflow.com/q/42460493, and the above linked native-file-system should as well, in pertinent part https://github.com/WICG/native-file-system/issues/44#issuecomment-477664639

Limiting this to a sandboxed filesystem is explicitly not the goal of the API. We want this API so web applications can integrate and interact better with native applications. Limiting things to one directory wouldn't help for that.

Getting access to the whole native file system is indeed not likely something we'll be doing, but giving access to more or less arbitrarily user picked files and directories, at least for reading, is really not that different from what existing APIs already let you do today. We're still working out exact details for how we think we can do this safely, but I think we will be able to minimize the risk of users doing things accidentally and users not being aware of the data they send.

web-platform-tests / wpt

Web initiated downloads not possible to test for #8790