veliovgroup / jazeee-meteor-spiderable

Fork of Meteor Spiderable with longer timeout, caching, better server handling
https://atmospherejs.com/jazeee/spiderable-longer-timeout
33 stars 9 forks source link

204 - No content #14

Closed Buom01 closed 9 years ago

Buom01 commented 9 years ago

Hi,

I have a probleme with my website, what I try to do, my pages return nothing with a 204 HTTP status code with ?_escaped_fragment_=, else, the 404 page send correctly the 404 HTTP status code, but is also empty

Notes:

if(Meteor.isServer){
  Spiderable.debug = true;
  Spiderable.customQuery = true;
}
Router.configure({
  notFoundTemplate: '_404'
});
Router.plugin('dataNotFound', {notFoundTemplate: '_404'});

Router.route('/', function () {
  this.render('main');
});

Router.onAfterAction(function(){
  if(this.ready()){ 
    Meteor.isReadyForSpiderable = true;
  }
});

console.log('Scripting...');

spiderable-test.html

<head>
  <title>spiderable-test</title>
</head>

<body>
</body>

<template name="main">
  <h1>Page d'accueil</h1>
</template>
<template name="_404">
  <h1>404</h1>
  <h3>Oops, page not found</h3>
  <p>Sorry, page you're requested is not exists or was deleted</p>
</template>

Mongo content:

Complete informations

[test-buom01.rhcloud.com ******]\> mongo $OPENSHIFT_MONGODB_URLtest
MongoDB shell version: 2.4.9                                                                                                                                                                               
connecting to: 127.*.*.*:27017/admin                                                                                                                                                                  
Welcome to the MongoDB shell.                                                                                                                                                                              
For interactive help, type "help".                                                                                                                                                                         
For more comprehensive documentation, see                                                                                                                                                                  
        http://docs.mongodb.org/
Questions? Try the support group
        http://groups.google.com/group/mongodb-user
> use test
switched to db test
> db.SpiderableCacheCollection.find({})
{ "_id" : "yLhw4TicXDzEeT8E5", "hash" : "f807c97ecbb6087b674f75cc45e714675b8cdd0b56959f7600c89ebc860f9678", "url" : "http://test-buom01.rhcloud.com/?___isRunningPhantomJS___=true", "headers" : [      { "name" : "Date",         "value" : "Mon, 03 Aug 2015 13:41:14 GMT" },    {       "name" : "Vary",        "value" : "Accept-Encoding" },  {       "name" : "Content-Type",        "value" : "text/html; charset=utf-8" },    {       "name" : "Content-Encoding",    "value" : "gzip" },     {       "name" : "Keep-Alive",  "value" : "timeout=15, max=100" },      {       "name" : "Connection",  "value" : "Keep-Alive" } ], "content" : "<!DOCTYPE html><html><head>\n  <link rel=\"stylesheet\" type=\"text/css\" class=\"__meteor-css__\" href=\"/20ae2c8d51b2507244e598844414ecdec2615ce3.css\">\n\n\n\n\n  \n\n\n\n\n<title>spiderable-test</title>\n</head>\n<body>\n\n\n\n<h1>Page d'accueil</h1></body></html>", "status" : 204, "createdAt" : ISODate("2015-08-03T13:41:15.360Z") }
> 

Just its content

{
   "_id":"yLhw4TicXDzEeT8E5",
   "hash":"f807c97ecbb6087b674f75cc45e714675b8cdd0b56959f7600c89ebc860f9678",
   "url":"http://test-buom01.rhcloud.com/?___isRunningPhantomJS___=true",
   "headers":[
      {
         "name":"Date",
         "value":"Mon, 03 Aug 2015 13:41:14 GMT"
      },
      {
         "name":"Vary",
         "value":"Accept-Encoding"
      },
      {
         "name":"Content-Type",
         "value":"text/html; charset=utf-8"
      },
      {
         "name":"Content-Encoding",
         "value":"gzip"
      },
      {
         "name":"Keep-Alive",
         "value":"timeout=15, max=100"
      },
      {
         "name":"Connection",
         "value":"Keep-Alive"
      }
   ],
   "content":"<!DOCTYPE html><html><head>\n  <link rel=\"stylesheet\" type=\"text/css\" class=\"__meteor-css__\" href=\"/20ae2c8d51b2507244e598844414ecdec2615ce3.css\">\n\n\n\n\n  \n\n\n\n\n<title>spiderable-test</title>\n</head>\n<body>\n\n\n\n<h1>Page d'accueil</h1></body></html>",
   "status":204,
   "createdAt":   ISODate("2015-08-03T13:41:15.360   Z")
}

My building home made script:

.build2.sh

./.compress2.sh
export MINI="./.mini/nodejs"
echo "{{{{BUILDING}}}}"
cd $MINI
echo "Remove output"
chmod -R 777 ./.demeteorized
rm -R ./.demetorized
echo "Build and demeteorization..."
demeteorizer
echo "Fix permissions again..."
chmod -R 777 ./.demeteorized

cd ./.demeteorized

echo "Adding env vars..."

# settings.json is actually empty
echo "process.env.METEOR_SETTINGS = '$(echo $(cat ../settings.json))';
$(cat ./main.js)" > ./main.js 

#sed -i '1i process.env.MAIL_URL = "smtp://*********/";' ./main.js
#sed -i '1i process.env.ROOT_URL = ("http://" + process.env.OPENSHIFT_APP_DNS) || "http://localhost:8000"' ./main.js
#sed -i '1i process.env.MONGO_URL = (process.env.OPENSHIFT_MONGODB_DB_URL + process.env.OPENSHIFT_APP_NAME) || "mongodb://localhost:27017/meteor";' ./main.js
#sed -i '1i process.env.PORT = process.env.OPENSHIFT_NODEJS_PORT || 8000;' ./main.js
#sed -i '1i process.env.BIND_IP = process.env.OPENSHIFT_NODEJS_IP || "127.0.0.1";' ./main.js

#sed -i '1i process.env.MAIL_URL = "smtp://*******/";' ./main.js
sed -i '1i process.env.ROOT_URL = "http://" + (process.env.OPENSHIFT_APP_DNS || "localhost:8000");' ./main.js
#sed -i '1i process.env.ROOT_URL = "http://"+ process.env.OPENSHIFT_NODEJS_IP + ":" + process.env.OPENSHIFT_NODEJS_PORT;' ./main.js
sed -i '1i process.env.MONGO_URL = (process.env.OPENSHIFT_MONGODB_DB_URL + process.env.OPENSHIFT_APP_NAME) || "mongodb://localhost:27017/meteor";' ./main.js
sed -i '1i process.env.PORT = process.env.OPENSHIFT_NODEJS_PORT || 8000;' ./main.js
sed -i '1i process.env.BIND_IP = process.env.OPENSHIFT_NODEJS_IP || "127.0.0.1";' ./main.js

echo "Copying into git directory..."
cd ../../..
rm -R ./.end/*
#mkdir ./.end
#cp -R ./.static-files/.git ./.end/
#cp -R ./.static-files/.openshift ./.end/
cp -R $MINI/.demeteorized/* ./.end/
echo "Done"

.compress2.sh

echo "{{{{COMPRESSION}}}}"
export MINI="./.mini/nodejs"
echo "Fix permissions..."
chmod -R 777 $MINI
echo "Removing..."
rm -R $MINI
echo "Copying..."
mkdir $MINI
cp -R * $MINI
cp -R ./.meteor $MINI
echo "Minifying..."
#htmlminify -o $MINI/spiderable-test.html $MINI/spiderable-test.html
echo "Remove enters..."
#echo  $(cat $MINI/spiderable-test.html)>$MINI/spiderable-test.html
echo "Done"

.deploy2.sh

cd .end
git add .
git commit -m "Update"
git push

Packages

http://test-buom01.rhcloud.com/ http://test-buom01.rhcloud.com/?_escaped_fragment_=

Conclusion

Anybody can help me ? What can I do ?


Thank you for reading, and sorry if I have a bad English

jazeee commented 9 years ago

For now, revert to jazeee:spiderable-longer-timeout 1.2.2 which works, but does not handle 404s. We will have to look at it. @dr-dimitru have you seen this issue?

dr-dimitru commented 9 years ago

@Buom01 At first try to access node app directly without proxy, could you provide a link (should be with port, like 3000, actually I've tried 300 port but got no response)? BTW (see image below) it is something wrong with your setup, your server not supposed to response with 204 screen shot 2015-08-03 at 7 46 15 pm

dr-dimitru commented 9 years ago

@jazeee I've never meet such issue before

Buom01 commented 9 years ago

Sadly, when I talked about proxy, I wanted to say a reverse proxy created by my hoster to get the 80 port (shared hosting...).

BTW (see image below) it is something wrong with your setup, your server not supposed to response with 204

... Now I remeber that my hoster was recently under maintenance, and I think that they had changed their proxy config, (...).

Solution found ! I changed ROOT_URL to nodejs directly like it;

process.env.ROOT_URL = "http://" + process.env.OPENSHIFT_NODEJS_IP +":"+process.env.OPENSHIFT_NODEJS_PORT';

And not like (not to http://test-buom01.rhcloud.com/)

process.env.ROOT_URL = "http://" + (process.env.OPENSHIFT_APP_DNS || "localhost:8000");

An idea is to get it from $PORT and $BIND_IP env variables and not from $ROOT_URL. (I dont know the default value of $ROOT_URL)

I keep this app in this status to notify my hoster that there are bugs with they apache proxy's config

Thank you very much :+1:

jazeee commented 9 years ago

Ok, sounds like it was a configuration/maintenance issue. Thanks.

jazeee commented 9 years ago

I found that this issue does exist, and is due to a bug in the phantomjs script.

It appears to depend on whether one uses a proxy, or similar. For example, if one uses nginx in front of meteor, for SSL services, for example.

One way to test is, create a meteor server, with nginx wrapper for SSL. Then, test the phantomjs script:

phantomjs --load-images=no --ssl-protocol=TLSv1 --ignore-ssl-errors=true --web-security=false jazeee-meteor-spiderable/lib/phantom_script.js https://your-server

This returns a 204. The reason is that https://github.com/jazeee/jazeee-meteor-spiderable/blob/master/lib/phantom_script.js#L60 is not correct.

page.onResourceReceived(...) triggers on any resource request, including image assets or Meteor Websocket connection. In the first part of that function, we correctly check for the URL, however, for the remaining part, we accept any status. Since this seems unnecessary, I am removing that part to fix this bug.

EDIT: When I test the current script, I see the URL: https://.../sockjs/383/l2de9dnl/xhr_send, with status 204, which is the wrong status for the page. If I remove the problem code, I get the right result.

dr-dimitru commented 9 years ago

Without this logic statement you will not receive status code on redirects

On 10 Aug 2015, at 18:40, Jaz notifications@github.com wrote:

I found that this issue does exist, and is due to a bug in the phantomjs script.

It appears to depend on whether one uses a proxy, or similar. For example, if one uses nginx in front of meteor, for SSL services, for example.

One way to test is, create a meteor server, with nginx wrapper for SSL. Then, test the phantomjs script:

phantomjs --load-images=no --ssl-protocol=TLSv1 --ignore-ssl-errors=true --web-security=false jazeee-meteor-spiderable/lib/phantom_script.js https://your-server This returns a 204. The reason is that https://github.com/jazeee/jazeee-meteor-spiderable/blob/master/lib/phantom_script.js#L60 is not correct.

page.onResourceReceived(...) triggers on any resource request, including image assets or Meteor Websocket connection. In the first part of that function, we correctly check for the URL, however, for the remaining part, we accept any status. Since this seems unnecessary, I am removing that part to fix this bug.

— Reply to this email directly or view it on GitHub.

dr-dimitru commented 9 years ago

I believe developer should workaround with proxy server, cause we are using nginx and SSL on our production stage with this package without any issues.

On 10 Aug 2015, at 18:47, Dmitriy A. Golev dr.dimitru@gmail.com wrote:

Without this logic statement you will not receive status code on redirects

On 10 Aug 2015, at 18:40, Jaz notifications@github.com wrote:

I found that this issue does exist, and is due to a bug in the phantomjs script.

It appears to depend on whether one uses a proxy, or similar. For example, if one uses nginx in front of meteor, for SSL services, for example.

One way to test is, create a meteor server, with nginx wrapper for SSL. Then, test the phantomjs script:

phantomjs --load-images=no --ssl-protocol=TLSv1 --ignore-ssl-errors=true --web-security=false jazeee-meteor-spiderable/lib/phantom_script.js https://your-server This returns a 204. The reason is that https://github.com/jazeee/jazeee-meteor-spiderable/blob/master/lib/phantom_script.js#L60 is not correct.

page.onResourceReceived(...) triggers on any resource request, including image assets or Meteor Websocket connection. In the first part of that function, we correctly check for the URL, however, for the remaining part, we accept any status. Since this seems unnecessary, I am removing that part to fix this bug.

— Reply to this email directly or view it on GitHub.

jazeee commented 9 years ago

I don't think it makes sense to allow rely on any arbitrary resource response code. For example, as part of the processing, I see a response code of 204, for URL: https://.../sockjs/383/l2de9dnl/xhr_send

This is the last URL that is processed before phantomJS finishes. It is quite complex to debug these issues, and other users will have serious problems that will appear random. For example, it works locally, but doesn't work due to some arbitrary nginx configuration, version, or other arbitrary server setup/load etc.

Since this issue affects the majority of normal cases, we will have to look at other options for redirects. For example, if it is specifically handling redirects, then we should capture that in a different way, and handle it specifically.

dr-dimitru commented 9 years ago

I believe it should work with any kind of redirects. BTW http-redirect is handled by response.redirectURL. But partly I agree with you, so we should pre-program support for "complete" responses, like 200, 302, 404, 400, 500, etc., and avoid "non-complete", like 206, 204, etc. And add all supported codes into docs, what do you think?

jazeee commented 9 years ago

Quite possible. My changes seem to also break 404, but I think we will have to be very careful about these issues. (In a separate ticket). I wouldn't want phantom to respond with a 404 just because an image file or asset is missing.

Of course, the primary goal is to make the successful paths work, in most people's scenarios.

jazeee commented 9 years ago

Just a note, if I log all response statuses during a redirect test, within onResourceReceived, I get about 15x200, then 204 then 4x200 then 204, then 200. These are most likely due to the various JavaScript pages or websocket connections occurring during the page load. I don't see any redirect status codes. If I test the same, but directly to meteor, bypassing nginx, I see about 12x200 codes.

Not sure if it is a PhantomJS issue, a polling issue, or something else. In any case, I doubt that we can count on the intermediate status codes being representative of the page's final code.

In reality, I think we have to specify something else, such as adding Spiderable.responseCode = 404, to the IronRouter/Meteor code.

dr-dimitru commented 9 years ago

It is easily solved at this statement The line you have removed (maybe was incorrect) shouldn't be removed, it should be replaced with something more complex to handle redirects (when page's URL is not equal to original requested page), now you broke correct redirects and all kind of responses (except 200 (which is set by default)).