intesso / connect-livereload

connect middleware for adding the livereload script to the response
MIT License
305 stars 53 forks source link

Corrupting non-html content #39

Open kuhnza opened 10 years ago

kuhnza commented 10 years ago

When livereload middleware is installed I'm finding that non-html content such as generated PDFs are being corrupted. I believe this is because livereload is trying to inject the reload script snippet into the content.

I notice that the code currently checks the request accept header however wouldn't it be more appropriate to check the response content-type header before injecting the script instead?

nukulb commented 10 years ago

+1 just wasted 1/2 a day tracking this down.

daywiss commented 10 years ago

i ran into thus bug. It made me very depressed for several hours trying to figure out why I could not stream a pdf file.

nukulb commented 10 years ago

depressed is a good word to describe how I felt as well. This one is brutal to track down.

andineck commented 10 years ago

checking the response content-type is not really of any help because it is not set yet, when this middleware does it's job. Any help on this issue is appreciated. A workaround for the time being would be to add these file types to the ignore list: ignore: [ '.pdf', ....]

corinnaSchultz commented 10 years ago

I ran into this problem with jar files, too. I was trying to view a page which loads an applet, and some (but not all) jar files were being corrupted, with unicode replacement characters everywhere. It's an incredibly difficult bug to track down.

I think the default ignore list should include all non-text file types.

Instead of looking at the response content-type, why can't you look at the file "magic number" at the beginning of the stream?

Climax777 commented 9 years ago

This is also a problem for me. Kind of a blocker. Been wasting 2 days trying to figure out the reason. Any workarounds? My URL's don't always contain the extension so Content-Type will work better for me, and the ignore list won't work for my use case.

Climax777 commented 9 years ago

The problem is not as much the missing of the content-type, the encoding gets mixed up, leaving all binary file downloads (which are not in exception list) corrupt.

What should happen, I propose, is that if no <body> is found, no encoding should be applied further.

The vanilla occurrence gives the following response:

HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: application/pdf
Date: Thu, 23 Oct 2014 10:13:18 GMT
Connection: keep-alive
Transfer-Encoding: chunked

With the middleware:

HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: application/pdf
content-length: 1089
Date: Thu, 23 Oct 2014 10:57:14 GMT
Connection: keep-alive

and the data is corrupted in the latter.

In essence, if no <body> tags are found, the same should happen as what happens with urls from the ignore list.

drawveloper commented 9 years ago

Hey, Pieter, does this still happens in v0.5.0? The new algorithm is a little more conservative and waits for the entire response before judging it is HTML. Or so it should! On Oct 23, 2014 8:58 AM, "Pieter Jordaan" notifications@github.com wrote:

The problem is not as much the missing of the content-type, the encoding gets mixed up, leaving all binary file downloads (which are not in exception list) corrupt.

What should happen, I propose, is that if no is found, no encoding should be applied further.

The vanilla occurrence gives the following response:

HTTP/1.1 200 OK X-Powered-By: Express Content-Type: application/pdf Date: Thu, 23 Oct 2014 10:13:18 GMT Connection: keep-alive Transfer-Encoding: chunked

With the middleware:

HTTP/1.1 200 OK X-Powered-By: Express Content-Type: application/pdf content-length: 1089 Date: Thu, 23 Oct 2014 10:57:14 GMT Connection: keep-alive

and the data is corrupted in the latter.

— Reply to this email directly or view it on GitHub https://github.com/intesso/connect-livereload/issues/39#issuecomment-60222593 .

Climax777 commented 9 years ago

Yes, just started a blank project just to test.

var express = require('express');
var app = express();
var PDFDocument = require('pdfkit');
var compression = require('compression');
var flash = require('express-flash');
app.use(compression());
app.use(flash());
app.use(require('connect-livereload')());
var router = express.Router();
router.get('/',function (req, res) {
    var doc = new PDFDocument();
    res.type('application/pdf');
    res.status(200);
    doc.pipe(res); // TODO why socket??
    //doc.pipe(fs.createWriteStream('file.pdf'));
    doc.info.Title = 'My First PDF';
    doc.info.Author = 'Petrus';
    doc.info.Subject = 'My subject';

    doc.text('Hello World');
    doc.end();
});
app.use('/', router);
var server = app.listen(3000, function () {

  var host = server.address().address;
  var port = server.address().port;

  console.log('Example app listening at http://%s:%s', host, port);
});

the same occurs when calling res.sendFile or res.download pointing to an actual pdf file.

andineck commented 9 years ago

Hey I eventually got the time to look further into this problem, and I made a draft version on the branch binary-files. I also added tests in there to check the binary Buffers whether they are equal and it looks good. Would you mind testing for now with this branch?

Climax777 commented 9 years ago

@andi-neck Thanks. Just tested your branch. With my example above, which pipes data to the response, it doesn't work. However the response headers are correct now, but no data is sent.

With my revised code to send a pdf file:

var express = require('express');
var app = express();
var PDFDocument = require('pdfkit');
var compression = require('compression');
var flash = require('express-flash');
app.use(compression());
app.use(flash());
app.use(require('connect-livereload')());
var router = express.Router();
router.get('/',function (req, res) {
   /* var doc = new PDFDocument();
    res.type('application/pdf');
    res.status(200);
    doc.pipe(res); // TODO why socket??
    //doc.pipe(fs.createWriteStream('file.pdf'));
    doc.info.Title = 'My First PDF';
    doc.info.Author = 'Petrus';
    doc.info.Subject = 'My subject';

    doc.text('Hello World');
    doc.end();*/
   res.sendFile('/home/pieter/AbsoluteNetwork.pdf');
});
app.use('/', router);
var server = app.listen(3000, function () {

  var host = server.address().address;
  var port = server.address().port;

  console.log('Example app listening at http://%s:%s', host, port);
});

this works.

Also using a read stream as below works with yours. Perhaps pdfkit is now the culprit, although it works perfectly without connect-livereload

   fs.createReadStream('/home/pieter/AbsoluteNetwork.pdf').pipe(res);
sonicoder86 commented 9 years ago

The new version 0.5 solved the issue for static content...still wasted half a day before upgrading...dynamic content still corrupted

gsikorski commented 9 years ago

Me too :( One day lost. I use dynamic content (tar), so your fix does not work.

andineck commented 9 years ago

@gsikorski which version of connect-livereload are you using? Could you provide some code in order to reproduce the error?

gsikorski commented 9 years ago

I am using latest version (0.5.3). My pseudo code is:

var tar = require('tar-fs');
var download = function (req, res) {
            res.set({
                    'Content-Description': 'File Transfer',
                    'Content-Type': 'application/x-tar',
                    'Content-Length': 512*152,
                    'Content-Disposition': 'attachment; filename="awesome.tar"'
                })
                .writeHead(200);

            tar.pack('/tmp/awesomeDir').pipe(res);
}
andineck commented 9 years ago

hmm, might be a problem when piping stuff. the current implementation is a bit of a hack . . . a clean stream implementation would help i think.

foxx commented 9 years ago

Any idea on when this is going to be fixed?

icompuiz commented 9 years ago

In my connect-livereload configuration, I passed the paths or extensions I want to ignore

require('connect-livereload')({ ignore: ['/api/files', 'download'] })

After several months of working around this issue, this single, non-hacky change fixed it.

adityab commented 9 years ago

I'm another person who spent about 4 hours trying to figure out why streaming out of GridFS was failing. Of course, it was not. ;)

@icompuiz Thanks for your fix. I wasn't aware of the ignore option. Blacklisting /api fixed it for me.

andineck commented 9 years ago

The ignore option might be helpful, the problem is though that the paths are user specific, and every one would have to adjust it according to their project setup.

I think we should work toward a solution that is save for binary files as well, so that they don't get corrupted.

Can someone provide a PR with a test case that is corrupting binary files?

Also any stream oriented solution would be welcomed, I would be happy to merge it.

x-cray commented 9 years ago

Lost several hours tracking down the issue. What I do is simply: request('https://www.google.com/images/srpr/logo11w.png').pipe(res); while using connect-livereload middleware.

eekboom commented 9 years ago

So I, too, was bitten by this and spent quite some time on it. For me it were Excel files that got corrupted. It works fine in 0.4.1 and is broken in 0.5.0 as well as in 0.5.3.

Here are the request details. Note that the URL does not contain any indication of the file type, but the "Content-Disposition" header does.

General
Remote Address:127.0.0.1:9001
Request URL:http://localhost:9001/cxf-services/batch/job-executions/9943ab85-bfc3-4870-89e5-ced8d66863cf/details
Request Method:GET
Status Code:200 OK

Response Headers
HTTP/1.1 200 OK
server: Apache-Coyote/1.1
set-cookie: JSESSIONID=64D49FD361BE01234569789CCD111C069; Path=/acme-web/; HttpOnly
x-ua-compatible: IE=edge
content-disposition: attachment; filename=booking.xls
date: Tue, 06 Oct 2015 11:06:34 GMT
expires: -1
content-type: application/vnd.ms-excel
transfer-encoding: chunked
connection: close
fondberg commented 8 years ago

Hi! I should have looked at this before... Using the angular-fullstack there were quite a few modules to go through.

Here is a full example usinge the demo from excelbuilderjs-node to build and serve an excel sheet from express.

Steps to reproduce:

  1. npm install excelbuilderjs-node
  2. download testdata.json
  3. Run the following app.js:
'use strict';

var express = require('express');
var app = express();
var testData = require('./testdata.json');
var morgan = require('morgan');
var Excel = require('excelbuilderjs-node');

app.use(require('connect-livereload')({
    ignore: ['.xlsx']
  }));
app.use(morgan('dev'));

//This works but not without the ending... app.get('/getexcel.xlsx', function(req, res) {
app.get('/getexcel', function(req, res) {
  var demoWorkbook = Excel.createWorkbook();
  var stylesheet = demoWorkbook.getStyleSheet();

  var themeColor = stylesheet.createFormat({
    font: {
      bold: true,
      color: {theme: 3}
    }
  });

  var columns = [
    {id: 'id', name: 'ID', type: 'number', width: 10},
    {id: 'name', name:'Name', type: 'string', width: 50},
    {id: 'number', name: 'Number', type: 'number', width: 50},
    {id: 'address', name: 'Address', type: 'string', width: 50},
    {id: 'lat', name: 'Lat', type: 'number', width: 15}
  ];

  var worksheetData = [
    [
      {value: 'ID', metadata: {style: themeColor.id, type: 'string'}}, 
      {value: 'Name', metadata: {style: themeColor.id, type: 'string'}}, 
      {value: 'Number', metadata: {style: themeColor.id, type: 'string'}}, 
      {value: 'Address', metadata: {style: themeColor.id, type: 'string'}}, 
      {value: 'Lat', metadata: {style: themeColor.id, type: 'string'}}
    ]
  ].concat(testData);

  var demoList = demoWorkbook.createWorksheet({name: 'Demo List'});

  demoList.setRowInstructions({
    height: 20
  });

  demoList.setData(worksheetData);
  demoList.setColumns(columns);

  demoWorkbook.addWorksheet(demoList);

  var result = Excel.createFile(demoWorkbook);
  var data = new Buffer(result, 'base64');

  res.setHeader('Content-Type', 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;base64');
  res.setHeader('Content-Disposition', 'attachment; filename=' + 'demo.xlsx');
  res.end(data);
});

app.listen(3300);

console.log('Listening on port 3300');
romain10009 commented 8 years ago

This issue drove me crazy for two days while working on zip streaming! Is there no way to block the injection of the script to html pages? I will try ignore "/api"

PaulMougel commented 8 years ago

@romain10009 You can use the ignore option, as described in this thread and in the README:

ignore gives you the possibility to ignore certain files or url's from being handled by connect-livereload.

The parameter is a list of strings or Regex:

  app.use(require('connect-livereload')({
    port: 35729,
    ignore: ['.js', '.svg']
  }));

The default is:

ignore: [
    /\.js(\?.*)?$/, /\.css(\?.*)?$/, /\.svg(\?.*)?$/, /\.ico(\?.*)?$/, /\.woff(\?.*)?$/,
    /\.png(\?.*)?$/, /\.jpg(\?.*)?$/, /\.jpeg(\?.*)?$/, /\.gif(\?.*)?$/, /\.pdf(\?.*)?$/
  ]

If you want to ignore /api, you can add /^\/api\/(.*)/ to this list.

romain10009 commented 8 years ago

Yes sure, i can use this feature. I just dont understand what the point is to try to inject the script tag inside a zip file. Would it not be better to use a white-list allowing only html files to be injected instead of a black-list to prevent such kind of trouble i ran into? On 7 Dec 2015 17:18, "Paul Mougel" notifications@github.com wrote:

@romain10009 https://github.com/romain10009 You can use the ignore option, as described in this thread and in the README https://github.com/intesso/connect-livereload#use:

ignore gives you the possibility to ignore certain files or url's from being handled by connect-livereload.

The parameter is a list of strings or Regex:

app.use(require('connect-livereload')({ port: 35729, ignore: ['.js', '.svg'] }));

The default is:

ignore: [ /.js(\?.)?$/, /.css(\?.)?$/, /.svg(\?.)?$/, /.ico(\?.)?$/, /.woff(\?.)?$/, /.png(\?.)?$/, /.jpg(\?.)?$/, /.jpeg(\?.)?$/, /.gif(\?.)?$/, /.pdf(\?.)?$/ ]

If you want to ignore /api, you can add /^\/api\/(.*)/ to this list.

— Reply to this email directly or view it on GitHub https://github.com/intesso/connect-livereload/issues/39#issuecomment-162456858 .

treejanitor commented 8 years ago

+1 Glad someone is documenting this one. Killed a chunk of time for me as well.

jasine commented 8 years ago

+1

diwu1989 commented 7 years ago

this is screwing us over on .xlsx files, would like to see microsoft document files also be in the default blacklist please

fkarlsson commented 7 years ago

Wow after HOURS of debugging chunked Transfer-Encoding I find this issue. I was trying to send a chunked response of a csv file.

I tried adding ignore: ['.csv'] to the middleware setup but that did not solve my problem. I have disabled the middleware in Express for now and use the LiveReload Chrome extension instead.

The problem is that connect-livereload hijacks the response.write() function and causes the response to buffer until the end. Anyone have any ideas? Maybe this should be a new issue.

gagan-bansal commented 2 years ago

I was having same issue.

I was using compression middleware. By making it applicable only for 'production', I could resolve the issue.

if (process.env.NODE_ENV === 'production') {
  app.use(compression({}));
}