thegreenwebfoundation / greencheck-api

The green web foundation API
https://www.thegreenwebfoundation.org/
Apache License 2.0
9 stars 3 forks source link

Add public IP ranges for cloud giants - AWS #18

Closed mrchrisadams closed 2 years ago

mrchrisadams commented 5 years ago

At the moment, we rely on Amazon being nice enough to update their green regions themselves.

This rarely happens, but they do expose their IP ranges for each region at the the url below

https://ip-ranges.amazonaws.com/ip-ranges.json

More info here:

https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html

Amazon have different green and non green regions, so we might represent the green regions as separate green hosters, or as one huge host, with an absolutely massive set of IP ranges available.

https://aws.amazon.com/about-aws/sustainability/

{
  "syncToken": "1559746744",
  "createDate": "2019-06-05-14-59-04",
  "prefixes": [
    {
      "ip_prefix": "18.208.0.0/13",
      "region": "us-east-1",
      "service": "AMAZON"
    },
    {
      "ip_prefix": "52.95.245.0/24",
      "region": "us-east-1",
      "service": "AMAZON"
    },
    {
      "ip_prefix": "52.194.0.0/15",
      "region": "ap-northeast-1",
      "service": "AMAZON"
    }]
}

https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html

AWS's own docs say these change a few times a week, so we'd likely need this running on a cronjob to stay accurate.

mrchrisadams commented 5 years ago

Hey @arendjantetteroo can I sanity check my approach for this with you?

How we'd automate this

We have the notion of HostingProviders in our system, who have either IP ranges or ASNs allocated to them.

In our case, because we have public IP ranges given here, we'd check the list of Hosting Providers we have, against the region key in this json blob from AWS.

So lets say we wanted to update the AWS Oregon regions (and let's assume for now for this example that us-west-2 really is all Oregon).

If we had this info here:

{
      "ip_prefix": "52.95.255.112/28",
      "region": "us-west-2",
      "service": "AMAZON"
    },

We'd find the corresponding hosting provider for Amazon US West (which we do have), and add the IP ranges 52.95.255.112/28 to that hosting provider entity.

Every few days, we'd check against this number and add the new IP ranges, if they change, and we'd run through this process for all the regions that are marked as green/sustainable according to their own info.

Sound about right?

It obviously would be nicer if they updated this stuff themselves, but given their size, this seems a way to at least keep the data as close to what their own data is saying.

arendjantetteroo commented 5 years ago

Yep, if we make one hosting provider entry per amazon region we can look up all ip adresses and verify they match up with the file and add/remove any entries we are missing or are no longer active.

It does mean that people will see Amazon US West as the provider and not only Amazon, but i think that's a fine tradeoff right now.

mrchrisadams commented 5 years ago

This is an acceptable trade-off as Amazon, publicly say different regions run on different kinds of power - doing it this way would allow us to distinguish between them transparently

boxabirds commented 4 years ago

Hi guys I saw this project in the Climate Action Tech newsletter and I'd like to help here but I'm missing the piece around where Amazon shares information about which areas are green, and how that maps to the IP ranges. Can someone help out with this?

mrchrisadams commented 4 years ago

Hi @boxabirds ! In the opening issue I mention this bit here - where Amazon basically say which regions are sustainable (which is about as good as we can get from them):

Amazon have different green and non green regions (snip)

https://aws.amazon.com/about-aws/sustainability

They list 5 regions - US-West (Oregon), Govcloud (US-West, again), Frankfurt, Ireland, and Canada.

I outline above where Amazon list IP ranges for each region, and I've shared the abridged snippet below:

{
    {
      "ip_prefix": "18.208.0.0/13",
      "region": "us-east-1",
      "service": "AMAZON"
    },
    {
      "ip_prefix": "52.95.245.0/24",
      "region": "us-east-1",
      "service": "AMAZON"
    },
    {
      "ip_prefix": "52.194.0.0/15",
      "region": "ap-northeast-1",
      "service": "AMAZON"
    }]
}

Under the hood with the Green Web Foundation, we represent each hosting region, as an entity with a known IP range. So, with the ip ranges above we can have these regions updating automatically.

If you can workout the mappings between the region names to the region codes, then you can you know enough to audit your own infrastructure against a list of regions where Amazon at least make public claims that they are sustainable.

Why this matters

You can see a thread here in more detail about why, but the TLDR version is at the bottom. https://twitter.com/mrchrisadams/status/1184854192428605441

move-to-green-regions

You might wonder why Amazon push this migration cost onto you, where others like GCP and MS take on the cost themselves, by just running on clean power across the board.

That's one to bring up with your AWS rep.

janeklb commented 4 years ago

Hi @mrchrisadams & @boxabirds I'm keen on helping out here too, if there's still something to do. From what I understand, we need to:

My questions would be:

jonathan-s commented 4 years ago

Basically the database is mysql, there is fair amount of legacy code in php (the api is written in php) and we recently re-wrote the admin part in python. Both the api and admin accesses the same underlying database.

You can find the admin code here https://github.com/thegreenwebfoundation/greenwebfoundation-admin

arendjantetteroo commented 4 years ago

Both the api and admin are ok places to get this setup, so whatever your choice of programming language fits :)

Eventually the goal would be to have an api for the admin system that other tools could also access to update their information programmatically instead of manually by hand.

Depending on the amount of time and challenge you want you could fit an api on top of the admin system in python and write a script that uses this api to update the ip ranges of amazon (we would need some kind of token to make sure the script can update the specific hosting provider). Or just write a simple script that can directly update the mysql database with insert/update queries.

Feel free to post any questions you have here and we'll try to answer them.

janeklb commented 4 years ago

Thanks @jonathan-s @arendjantetteroo

I do have thoughts / questions, but I'd prefer to move the conversation into a chat (slack, gitter, whatever works for you). Not for the sake of realtime communication, but just because the questions will span wider than this specific github issue :)

Do you use any chat platform for this project?

arendjantetteroo commented 4 years ago

@mrchrisadams i guess we can close this now with #64 done?

mrchrisadams commented 2 years ago

hey @arendjantetteroo, Yes - we totally should have closed this a while back.

I'm closing this as we have this working on a daily github action now,and there's an example of the daily import running and being updated at the link below:

https://github.com/thegreenwebfoundation/admin-portal/runs/5531520377?check_suite_focus=true