veliovgroup / jazeee-meteor-spiderable

Fork of Meteor Spiderable with longer timeout, caching, better server handling
https://atmospherejs.com/jazeee/spiderable-longer-timeout
33 stars 9 forks source link

Error serving static file Error: Requested Range Not Satisfiable #4

Closed dr-dimitru closed 9 years ago

dr-dimitru commented 9 years ago

@jazeee Hi, Sometimes we're expecting this issue:

Error serving static file Error: Requested Range Not Satisfiable
Error: Meteor code must always run within a Fiber. Try wrapping callbacks that you pass to non-Meteor libraries with Meteor.bindEnvironment.
    at Object.Meteor._nodeCodeMustBeInFiber (packages/meteor/dynamics_nodejs.js:9:1)
    at [object Object]._.extend.get (packages/meteor/dynamics_nodejs.js:21:1)
    at [object Object].RouteController.lookupOption (packages/iron:router/lib/route_controller.js:66:1)
    at new Controller.extend.constructor (packages/iron:router/lib/route_controller.js:26:1)
    at [object Object].ctor (packages/iron:core/lib/iron_core.js:88:1)
    at Function.Router.createController (packages/iron:router/lib/router.js:201:1)
    at Function.Router.dispatch (packages/iron:router/lib/router_server.js:39:1)
    at Object.router (packages/iron:router/lib/router.js:15:1)
    at next (/Users/dmitriygolev/.meteor/packages/webapp/.1.2.0.xohm6p++os+web.browser+web.cordova/npm/node_modules/connect/lib/proto.js:190:15)
    at packages/jazeee:spiderable-longer-timeout/spiderable_server.js:148:1

But we can not figure out how to fix it

jazeee commented 9 years ago

I can look at this.

dr-dimitru commented 9 years ago

I've found how to implement the error:

curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot

The errors I got: Many of:

(webapp_server.js:419) Error serving static file Error: Requested Range Not Satisfiable

and (this one looks like is almost fixed on our end, so I'll send a PR soon):

spiderable: phantomjs failed: { [Error: Command failed: [CRITICAL] QNetworkReplyImpl: backend error: caching was enabled after some bytes had been written [CRITICAL] QNetworkReplyImpl: backend error: caching was enabled after some bytes had been written] killed: true, code: null, signal: 'SIGTERM' } 
stderr: [CRITICAL] QNetworkReplyImpl: backend error: caching was enabled after some bytes had been written [CRITICAL] QNetworkReplyImpl: backend error: caching was enabled after some bytes had been written

(STDERR) Error: Meteor code must always run within a Fiber. Try wrapping callbacks that you pass to non-Meteor libraries with Meteor.bindEnvironment.
(STDERR)     at Object.Meteor._nodeCodeMustBeInFiber (packages/meteor/dynamics_nodejs.js:9:1)
(STDERR)     at [object Object]._.extend.get (packages/meteor/dynamics_nodejs.js:21:1)
(STDERR)     at [object Object].RouteController.lookupOption (packages/iron:router/lib/route_controller.js:66:1)
(STDERR)     at new Controller.extend.constructor (packages/iron:router/lib/route_controller.js:26:1)
(STDERR)     at [object Object].ctor (packages/iron:core/lib/iron_core.js:88:1)
(STDERR)     at Function.Router.createController (packages/iron:router/lib/router.js:201:1)
(STDERR)     at Function.Router.dispatch (packages/iron:router/lib/router_server.js:39:1)
(STDERR)     at Object.router (packages/iron:router/lib/router.js:15:1)
(STDERR)     at next (/Users/dmitriygolev/.meteor/packages/webapp/.1.2.0.15p5axz++os+web.browser+web.cordova/npm/node_modules/connect/lib/proto.js:190:15)
(STDERR)     at Object.handle (packages/ostrio:cookies/cookies.coffee:15:7)
(webapp_server.js:419) Error serving static file Error: Requested Range Not Sati
jazeee commented 9 years ago

This suggests that the system is overloaded. I saw another spiderable fork that provides a cache mechanism. After the first curl request, it returns the cached version. It rebuilds the cached after some expiration time. Otherwise, we have denial of service, with many instances of phantom. On Jul 20, 2015 9:02 PM, "dr.dimitru" notifications@github.com wrote:

I've found how to implement the error:

curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot &&am p; curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot &&am p; curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1:3000 -A googlebot && curl http://127.0.0.1 -A googlebot

The errors I got: Many of:

Error serving static file Error: Requested Range Not Satisfiable

and (this one looks like is almost fixed on our end, so I'll send a PR soon):

spiderable: phantomjs failed: { [Error: Command failed: [CRITICAL] QNetworkReplyImpl: backend error: caching was enabled after some bytes had been written [CRITICAL] QNetworkReplyImpl: backend error: caching was enabled after some bytes had been written] killed: true, code: null, signal: 'SIGTERM' } stderr: [CRITICAL] QNetworkReplyImpl: backend error: caching was enabled after some bytes had been written [CRITICAL] QNetworkReplyImpl: backend error: caching was enabled after some bytes had been written

(STDERR) Error: Meteor code must always run within a Fiber. Try wrapping callbacks that you pass to non-Meteor libraries with Meteor.bindEnvironment. (STDERR) at Object.Meteor._nodeCodeMustBeInFiber (packages/meteor/dynamicsnodejs.js:9:1) (STDERR) at [object Object]..extend.get (packages/meteor/dynamics_nodejs.js:21:1) (STDERR) at [object Object].RouteController.lookupOption (packages/iron:router/lib/route_controller.js:66:1) (STDERR) at new Controller.extend.constructor (packages/iron:router/lib/route_controller.js:26:1) (STDERR) at [object Object].ctor (packages/iron:core/lib/iron_core.js:88:1) (STDERR) at Function.Router.createController (packages/iron:router/lib/router.js:201:1) (STDERR) at Function.Router.dispatch (packages/iron:router/lib/router_server.js:39:1) (STDERR) at Object.router (packages/iron:router/lib/router.js:15:1) (STDERR) at next (/Users/dmitriygolev/.meteor/packages/webapp/.1.2.0.15p5axz++os+web.browser+web.cordova/npm/node_modules/connect/lib/proto.js:190:15) (STDERR) at Object.handle (packages/ostrio:cookies/cookies.coffee:15:7) (webapp_server.js:419) Error serving static file Error: Requested Range Not Sati

— Reply to this email directly or view it on GitHub https://github.com/jazeee/jazeee-meteor-spiderable/issues/4#issuecomment-123128420 .

dr-dimitru commented 9 years ago

@jazeee Can we use same cache mechanism? Could you leave a link to this fork here?

jazeee commented 9 years ago

This is it, however I recall it didn't work with the latest Meteor. https://github.com/chfritz/meteor-spiderable-cached/

In looking at that code, I think we could easily adapt its concepts.

If Meteor.wrapAsync works, I'd prefer it over the Bind code you committed, primarily because it would be more readable, and perhaps more standard.

Obviously, if you add the cache mechanisms, you will make it harder to see the bug you demonstrated with repeated call requests, so probably would want to test wrapAsync first.

dr-dimitru commented 9 years ago

I think we can reproduce cache mechanism if we will hash requested URL and use it as file name on L130. If this file exists and created at last 30 mins (by default, and let to users set custom TTL time) return file content, if no file exists or it's created earlier than TTL run phantomjs. If you okay with this idea - I can push that changes into open PR

jazeee commented 9 years ago

I think we should get the other PR complete, and do this as a separate PR. It will keep the changes cleaner.

I like the idea of using a hash for the filename, however we may see two fibers attempting to update the same file. For example: One request comes in to http://a.com/test. Spiderable creates the file and before phantomJS starts... A second request comes in to the same URL. Spiderable starts overwriting the file. the first phantomJS starts but with a malformed file.

In addition, relying on the file system to report that a file exists can cause problems across threads. Both threads may think the file is non-existent before they both start writing to the file:

if !File.exists(file)
   File.write(file, contents)

Since there is no way to do this atomically, we could see concurrency issues and random file lock errors.

On a heavily loaded server we may see concurrency issues like this. It will likely be more complicated to work through the scenarios.

dr-dimitru commented 9 years ago

The fs.writeStream() is concurrent stable. But I agree - it may brings some troubles. So, other ways is in-memory caching and/or mongo, or you see any other? Other thing is - we can offer all three ways to work with cache:

Let me know what do you think

jazeee commented 9 years ago

I'd probably stick to a single cache option. Keep it simple. The other fork used MongoDB, and Meteor is built on that. It seems like it is a good choice for this idea.

jazeee commented 9 years ago

I believe your changes should have fixed this issue.

dr-dimitru commented 9 years ago

@jazeee didn't get you..

dr-dimitru commented 9 years ago

Seems to be solved, I did't saw any of those error during last week.