structured-data / linter

Structured Data linter
The Unlicense
88 stars 18 forks source link

Run serverless SDL on AWS Lambda #47

Open jaygray0919 opened 5 years ago

jaygray0919 commented 5 years ago

Hey Gregg. As you'll recall, we have tried to visualize some large graphs on your server. When we ran into problems, your point was: we should run them on our server because you are not set up, and never intended, to process large graphs.

An example is here: https://gist.github.com/jaygray0919/00247a76f6f902fd936e8e98a8666d20

Per your suggestion, we've implemented SDL on an AWS server, following your installation instructions. You can see it here: http://54.147.126.75:5000/ It is running on a spot AWS server.

Our next thought is: modify your code to run serverless using AWS Lambda. AWS Lambda does support recent versions of Ruby. Our serverless goal is to minimize the expense of an always-on spot server and to visualize very large graphs. For example, we have a fish database, currently in .n3 format, that is 43GB. We might reconfigure a subset for a schema.org + JSON-LD and visualize on SDL.

May we ask you questions about a path to serverless?

  1. Is it realistic to modify SDL to run serverless under AWS Lambda?
  2. What combination of your products and related services would you recommend we implement? For example, is there some combination of gems + services (e.g.sinatra, puma, shotgun, etc.) that we should be using?
  3. Has someone already done this, and has a roadmap that we could follow?

You also may have comments/suggestions about our current AWS AMI implementation, so please share those ideas if we are missing something or have done something wrong.

Thanks for your help here Gregg

/jay gray

gkellogg commented 5 years ago

Hey Gregg. As you'll recall, we have tried to visualize some large graphs on your server. When we ran into problems, your point was: we should run them on our server because you are not set up, and never intended, to process large graphs.

Indeed, the SDL has been running on a free Heroku dyno, AWS would be cool.

An example is here: https://gist.github.com/jaygray0919/00247a76f6f902fd936e8e98a8666d20

Per your suggestion, we've implemented SDL on an AWS server, following your installation instructions. You can see it here: http://54.147.126.75:5000/ It is running on a spot AWS server.

Our next thought is: modify your code to run serverless using AWS Lambda. AWS Lambda does support recent versions of Ruby. Our serverless goal is to minimize the expense of an always-on spot server and to visualize very large graphs. For example, we have a fish database, currently in .n3 format, that is 43GB. We might reconfigure a subset for a schema.org + JSON-LD and visualize on SDL.

May we ask you questions about a path to serverless?

  1. Is it realistic to modify SDL to run serverless under AWS Lambda?

I don't know enough about AWS Lambda, but I don't see why not. It's a "pure" server, in the sense that it doesn't use any persistent storage other than the local example files. It would be easy enough to create a version without the examples.

It could benefit from some sort of server-side cache to reduce network requests for contexts, and such, but it already runs with baked-in contexts for certain well-known vocabularies, such as schema.org.

  1. What combination of your products and related services would you recommend we implement? For example, is there some combination of gems + services (e.g.sinatra, puma, shotgun, etc.) that we should be using?

The work is done by the linkeddata gem, and the lightweight server just uses the CLI interface into the libraries. sinatra, puma, and shotgun are only there to facilitate the Rack container, so really anything could work. I often just use rackup locally, but sinatra can help with various client assets. It should be easy enough to adapt to some other container.

(I made an attempt with docker, but didn't have enough time to finish).

  1. Has someone already done this, and has a roadmap that we could follow?

No, you're the first. Please feel free to make PRs.

You also may have comments/suggestions about our current AWS AMI implementation, so please share those ideas if we are missing something or have done something wrong.

The instance you have running looks great!