vantage-sh / ec2instances.info

Amazon EC2 instance comparison site
https://ec2instances.info
MIT License
5.14k stars 581 forks source link

Site generation fails after latest Amazon site redesign #75

Closed powdahound closed 10 years ago

powdahound commented 10 years ago

In #37 a tool was introduced to update the contents of this site automatically by scraping the AWS site. Unfortunately Amazon's latest site design and is no longer providing all the necessary info.

Specifically:

The instance types page used to contain a nice table of all the instance types and their basic stats, but now only has a few less-descriptive tables for the newer instance types. The previous generation instance types page has some of this info, but not everything.

I'm still looking for a place to get this data. Unfortunately they don't have an API exposing it.

imran2140 commented 10 years ago

The 12 column wide Instance Type Matrix section in new instance types page is quite informative. Did you check it out? The section is collapsed by default, expands only when you click the section title.

powdahound commented 10 years ago

Unfortunately that one's missing many of the instance types as well as # of cores per instance type. It does seem to be the closest thing they still have, though.

imran2140 commented 10 years ago

Yeah, the legacy instances have been moved into a separate page. Also Amazon stopped advertising ECU for the newer instances.

Are you thinking of maintaining 2 tables/pages, or normalizing both datasets into a single table?

powdahound commented 10 years ago

It'd be nice to keep it all on one table, which would require finding a source of data for the old instances (that page you linked doesn't have all the same info). I'll try reaching out to Amazon and see if they're thinking of providing anything else before committing to scraping all these new locations.

shandrew commented 10 years ago

I've sent in a request through our AWS account rep to have a single source of the instance specs, and for the future, some more structured data.

(you probably don't remember me but I used to work at Affinity Circles, across from your old hipchat office in downtown SNV :)

jbylund commented 10 years ago

What about one of the json sources? http://stackoverflow.com/questions/7334035/get-ec2-pricing-programmatically I think http://aws-assets-pricing-prod.s3.amazonaws.com/pricing/ec2/linux-od.js might be a good choice.

shandrew commented 10 years ago

Nice. Could be useful, but the r3 instances are missing.

psanford commented 10 years ago

I have a script I use which was using the old linux-od.js api. Now I have to use

Unfortunately these are not valid json resources (their essentially jsonp, but using javascript syntax that isn't compatible with standard compliant json parsers). I run them through a node script to generate valid JSON before consuming the data.

powdahound commented 10 years ago

Those look promising, though the comment at the top is concerning. Feels like we're likely to end up in this situation again.

I've reached out to our main AWS contact to see if they are able to provide any more info.

jonathanwcrane commented 10 years ago

I would think AWS would cooperate, seeing as I learned about this website in a class given by AWS. Though they were careful to say it's not guaranteed to keep working, yadda, yadda. It's a great resource and very widely used.

powdahound commented 10 years ago

I'm in touch with AWS and seeing what they can do. Hang tight. :)

ReneBZ commented 10 years ago

I am also very interested in having all the EC2 pricing in a table or xml. I will keep checking here for any news of you powdahound. Thanks.

RexGibson commented 10 years ago

status?

powdahound commented 10 years ago

After talking with AWS a bit it seems like the right path forward is to scrape what we can from the new resources. Unfortunately there isn't any secret endpoint or table that has all the same data.

I don't have time to update the scraping code for at least a week, so if anyone wants wants to help it would be appreciated. :)

jonathanwcrane commented 10 years ago

Thanks for looking into this. It seems that AWS is discouraging people from using the "Previous Generation" instance types. They are doing it by segmenting them off from the "main" types.

Garret, have you given any thought to separating out ec2instances.info in the same way? Like have the default page be the "current generation" (whatever that may be--the URL for that will presumably remain the same even when we get up to m5, m7, etc). And then have a link for "previous generation" and then another link for "see them all one one page?"

Just a thought, j

On Wed, Apr 30, 2014 at 1:06 PM, Garret Heaton notifications@github.comwrote:

After talking with AWS a bit it seems like the right path forward is to scrape what we can from the new resources. Unfortunately there isn't any secret endpoint or table that has all the same data.

I don't have time to update the scraping code for at least a week, so if anyone wants wants to help it would be appreciated. :)

— Reply to this email directly or view it on GitHubhttps://github.com/powdahound/ec2instances.info/issues/75#issuecomment-41822194 .

powdahound commented 10 years ago

I don't see any value in splitting them up, especially since you can filter them easily. Perhaps the default filter could hide them in the future when they're less used.

matthewbogner commented 10 years ago

I've been using these links successfully for a while:

https://a0.awsstatic.com/pricing/1/deprecated/ec2/linux-od.json https://a0.awsstatic.com/pricing/1/deprecated/ec2/previous-generation/linux-od.json

Fool-proof? No. Better than nothing? Yep. It's got vCPUs, storage, prices, etc. and it is real json so you can parse it with python.

EDIT: Both of those links together contain the old and new generations of instances (m1/m2 & m3/r3/c3)

powdahound commented 10 years ago

The "deprecated" bit of that URL is a bit scary. I've asked someone at Amazon if they can speak to the future of these endpoints.

jonathanwcrane commented 10 years ago

That one doesn't contain any of the previous generation stuff or what type of CPUs we're talking about.

On Mon, May 12, 2014 at 8:19 PM, Derek Kulinski notifications@github.comwrote:

What about this one?

https://a0.awsstatic.com/pricing/1/ec2/linux-od.min.js

Looks like it is up to date and does not look deprecated.

— Reply to this email directly or view it on GitHubhttps://github.com/powdahound/ec2instances.info/issues/75#issuecomment-42904450 .

takeda commented 10 years ago

Understood.

BTW: I removed my comment when I realized that this link was already posted by @psanford and I did not want to pollute this thread more. I forgot that people were still notified about my response by e-mail.

jbylund commented 10 years ago

I did a quick sketch of reprocessing the json for my personal use if someone wants to flesh it out: https://gist.github.com/jbylund/a8b00676328e481a6a48 right now a fair few things are hardcoded.

rlpowell commented 10 years ago

I'll note that the "choose an instance type" table when you actually go to launch an instance through the web UI, is really very complete. Perhaps you could get it there, or from a related API call?

xonder commented 10 years ago

Have you tried looking at the calculator page? http://calculator.s3.amazonaws.com/index.html

screenshot 2014-06-16 13 12 33

ReneBZ commented 10 years ago

I just wrote a Python script to scan all the js files with EC2 and RDS pricing and output a clean CVS file that can be imported to any spreadsheet. It is at https://github.com/ReneBZ/AWSrawprices Let me know if it is useful to you of if you can improve it.

jonathanwcrane commented 10 years ago

Awesome! Thanks!