veliovgroup / jazeee-meteor-spiderable

Fork of Meteor Spiderable with longer timeout, caching, better server handling
https://atmospherejs.com/jazeee/spiderable-longer-timeout
33 stars 9 forks source link

gzip encoding haeder breaks cached pages #23

Closed strikeout closed 8 years ago

strikeout commented 8 years ago

Thank you for this wonderful package (!!!)

I noticed a strange error when we deployed this package on the production system running nginx + phusion passenger with enabled gzip encoding.

The header Content-Encoding:gzip gets correctly set on the cached page in the collection, accessing the cached page then returns a ERR_CONTENT_DECODING_FAILED in Chrome until I manually remove the content-encoding header from the cached document in the db.

As a quick (and working) fix I did the following:

if result.headers?.length > 0
        for header in result.headers
            res.setHeader header.name, header.value if (header.value != 'gzip')
    else
        res.setHeader 'Content-Type', 'text/html'
    res.writeHead result.status

You may want to take a look at this problem as I did not have enough time for a thorough investigation.

cheers //s

dr-dimitru commented 8 years ago

Could you post sample of DB record with gzip encoding?

dr-dimitru commented 8 years ago

@strikeout can't reproduce. Could you please explain on which step content became gzipped? And headers mismatch?

acemtp commented 8 years ago

I had exactly the same issue, I use phantom 1.9.8.

In the spiderable cache collection, there s a header content type gzip but the content is clearly not zipped.

With @strikeout patch, it fixes the problem.

Here is a gist of a document of the cache collection (from mongo):

https://gist.github.com/acemtp/a9ab03a85031a29f2c2b

dr-dimitru commented 8 years ago

@acemtp Thanks for data-dump, now I see - it has both content-type headers. We will release an update soon

acemtp commented 8 years ago

Cool that it helps. I spent hours to figure out why og:image wasn't handled by facebook and it was because the default mdg spiderable try to "phantom" the image generated by collectionfs.

With your cool package, at least I can add an ignore rules on /cfs/