solidusjs / solidus

A simple server that generates pages from JSON and Templates
MIT License
28 stars 7 forks source link

Handling resource 404s #117

Closed pushred closed 9 years ago

pushred commented 9 years ago

While Solidus has 404 middleware for handling missing views this isn’t exposed to developers for use when a resource is missing content. This limitation combined with dynamic resources is a potent combination that negatively impacts SEO and exception tracking.

In the case of SEO, many empty pages can populate a search engine index as any missing resources will simply render an empty or incomplete page. This is likely leading to quite a bit of junk landing pages that could be impacting various metrics, such as bounce rate.

The more likely result is a preprocessor error however, which leads to an exception that translates to a wrong status for crawlers and a poor user experience. This also makes for very noisy exception tracking, making it nearly impossible to distinguish problematic code from 404s. The amount largely depends on how many old links are still out there and how transitions from older platforms were handled — or not handled.

In some cases we should actually receive a 404 status from a resource when requesting something that doesn’t exist. Examples of some typical resources that do this include:

The older WordPress JSON API plugin returns a 200 (necessary for clientside use), with this body:

{
    "status": "error",
    "error": "Not found."
}

Likewise for Universe, with this body:

{
    "status": "error",
    "messages": [
        "No event found"
    ]
}

and YouTube’s v3 API:

{
    "status": "error",
    "messages": [
        "invalid_json"
    ],
    "response": "Invalid id"
}

and YouTube v2:

{
    "status": "error",
    "messages": [
        "connection_error",
        "Content-decoder error"
    ],
    "response": null
}

and WordPress.com:

{
    "status": "error",
    "messages": [
        "invalid_status"
    ],
    "response": "{\"error\":\"unknown_post\",\"message\":\"Unknown post\"}"
}

and Instagram:

{
    "status": "error",
    "messages": [
        "invalid_status"
    ],
    "response": "{\"meta\":{\"error_type\":\"APINotFoundError\",\"code\":400,\"error_message\":\"this location does not exist\"}}"
}

and Twitter:

{
    "status": "error",
    "messages": [
        "invalid_status"
    ],
    "response": "{\"errors\":[{\"message\":\"Sorry, that page does not exist\",\"code\":34}]}"
}

and Facebook:

{
    "status": "error",
    "messages": [
        "invalid_status"
    ],
    "response": "{\"error\":{\"message\":\"(#803) Some of the aliases you requested do not exist: bubbalubabubbasubba\",\"type\":\"OAuthException\",\"code\":803}}"
}

and 500px:

{
    "status": "error",
    "messages": [
        "invalid_status"
    ],
    "response": "{\"error\":\"Photo with ID 1231455129912912929211 not found.\",\"status\":404}"
}

So basically the vast majority of resources are going to return 200 statuses because they’re designed to work with both browser and server-based requests. For anything that does return a 404, I think we should trigger a 404 in the Solidus view, even if other resources are successful. Because chances are, the resource that is triggering that 404 is the one that’s necessary to render content on the page. If we want to make this a bit smarter we could opt to apply that logic only to any resources with dynamic parameters.

For every other resource we’ll need to resort to ducktyping to make the 404 determination. Luckily it seems that "status": "error" is very consistently provided. So we can at the very least use that. Since all of the resources I’ve provided examples for here do just that, perhaps we start there, and expand this over time if needed to cover edge cases.

pushred commented 9 years ago

Something else that could be helpful for handling edge cases is a way to return something other than a context from a preprocessor, i.e. triggering a 404 status and even maybe a redirect. One use case on the latter is pages that are populated with content from a feed vs. an API. Feeds have a finite amount of content, so as content drops off any dependent pages become 404s. We could redirect the visitor back to the source or elsewhere instead.

joanniclaborde commented 9 years ago

Also TODO: handle those kinds of errors in SolidusClient and Universe.js.

joanniclaborde commented 9 years ago

I'm thinking of returning either a 404 or 500 in case of error, as you are suggesting. For the .json views, I wonder if I should include the whole context, for easier debugging. Something like:

// 404
{
  "status": "error",
  "error": {
    "message": "'news': resource not found",
    "data": {
      "code": "ENOTFOUND",
      "errno": "ENOTFOUND",
      "syscall": "getaddrinfo"
    }
  },
  "url": {...},
  "resources": {...},
  ...
}

// 500
{
  "status": "error",
  "error": {
    "message": "'index.js': preprocessor error",
    "data": {...}
  },
  "url": {...},
  "resources": {...},
  ...
}
joanniclaborde commented 9 years ago

Also TODO: find a way to handle this properly https://github.com/visionmedia/superagent/issues/450