Open machawk1 opened 10 years ago
The image content is corrupted as compared to an Archive-It WARC. Something's not write in the JS code that is storing the image data. Encoding, maybe?
Hex 89 is becoming hex EFBFBD. This sounds waaay too familiar, like a BOM issue.
Part of the problem is that the call to fetch the image data via Ajax has required synchronicity for string building. Otherwise an arraybuffer or a Blob (see https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/Sending_and_Receiving_Binary_Data ) could be used, except the W3C spec says that with these data types must be fetched via Ajax using async.
An alternative might be to try to capture the image data using the Chrome facilities when it first comes in but the response handlers don't seem to have access to this data.
Woo, created a basis solution! Now, to scale it.
var hexValue = 0x89; var png = "PNG";
var hexValueArrayBuffer = new ArrayBuffer(1); var hexValueInt8Ary = new Int8Array(hexValueArrayBuffer); hexValueInt8Ary[0] = hexValue;
var blob = new Blob([hexValueInt8Ary,png]); saveAs(blob,"out.txt");
Content length is now correct for simple case (mkdc) but not for large cases (e.g., CNN.com, FB)
Fixing this would probably fix a few other issues down the line.