thegreenwebfoundation / grid-intensity-go

A tool written in go to help you factor carbon intensity into decisions about where and when to run computing jobs.
Apache License 2.0
65 stars 7 forks source link

For the Prometheus exporter, the values based on ElecticityMaps provider are estimations and not the real values #81

Closed locomundo closed 5 months ago

locomundo commented 6 months ago

We've been trying out the grid-intensity-go project, specifically the prometheus exporter. It looks great!

We are using it with ElectricityMaps provider, since it seems the most complete, and with the coverage we need right now. We see what the ElectricityMaps API returns as last value is an estimation, see for example:

$ curl 'https://api.electricitymap.org/v3/carbon-intensity/latest?zone=NL' | jq
{
  "zone": "NL",
  "carbonIntensity": 124,
  "datetime": "2024-05-17T10:00:00.000Z",
  "updatedAt": "2024-05-17T09:48:04.869Z",
  "createdAt": "2024-05-14T10:46:29.171Z",
  "emissionFactorType": "lifecycle",
  "isEstimated": true,
  "estimationMethod": "TIME_SLICER_AVERAGE"
}

You can see the property isEstimated has value true. This is because the real values are calculated and returned by the API after a few hours.

If you use the history API endpoint you can see that the third most recent value is not estimated any more:

curl 'https://api.electricitymap.org/v3/carbon-intensity/history?zone=NL' \ | jq
{

...

    {
      "zone": "NL",
      "carbonIntensity": 104,
      "datetime": "2024-05-21T11:00:00.000Z",
      "updatedAt": "2024-05-21T12:48:27.994Z",
      "createdAt": "2024-05-18T11:48:23.717Z",
      "emissionFactorType": "lifecycle",
      "isEstimated": false,
      "estimationMethod": null
    },
    {
      "zone": "NL",
      "carbonIntensity": 99,
      "datetime": "2024-05-21T12:00:00.000Z",
      "updatedAt": "2024-05-21T12:48:27.994Z",
      "createdAt": "2024-05-18T12:52:49.606Z",
      "emissionFactorType": "lifecycle",
      "isEstimated": true,
      "estimationMethod": "TIME_SLICER_AVERAGE"
    },
    {
      "zone": "NL",
      "carbonIntensity": 109,
      "datetime": "2024-05-21T13:00:00.000Z",
      "updatedAt": "2024-05-21T12:48:27.994Z",
      "createdAt": "2024-05-18T13:50:55.980Z",
      "emissionFactorType": "lifecycle",
      "isEstimated": true,
      "estimationMethod": "TIME_SLICER_AVERAGE"
    }
  ]
}

The first object shown in this list has isEstimated set to false. (This request was done at around 13:00 UTC of 21 May).

So it seems that the exporter is exporting the estimated values, but not the real (final) ones.

This could be solved by calling the EM history endpoint instead of the current, getting the latest "real" value and sending this to Prometheus. As @rossf7 in that case it could be sent as a histogram metric instead of a gauge.

Any thoughts? @mrchrisadams

mrchrisadams commented 6 months ago

hi @locomundo! I think this is good idea.

This is because the real values are calculated and returned by the API after a few hours.

I'll be honest - I wasn't that familiar with the history endpoint, and I didn't know the updated data was available so quickly afterwards. There's a good case for doing this, but I'm little unsure of how best to support this notion of supporting estimated figures, vs the later confirmed ones - we've so far assumed the exporter would always show the latest value, because it was giving a figure for 'now'.

If we know after a few hours we will have a matching "real" value then we could backfill, but I'm less clear on how to send that to prometheus.

Is there a well supported backfill API for sending data in this fashion, or making in an exporter that a given set of values really refers to a time an hour or two earlier than when it was scraped?

mrchrisadams commented 6 months ago

hey @rossf7 - this sounds like a really cool idea. Is there an obvious way that springs to mind? I've also asked in the prometheus channel in the CNCF slack for pointers too.

rossf7 commented 6 months ago

Yes, this is a cool idea but @locomundo did you see the disableEstimations param on the latest endpoint?

It looks like this does the same but without needing to call the history endpoint.

curl 'https://api.electricitymap.org/v3/carbon-intensity/latest?zone=NL&disableEstimations=true'
{
  "zone": "NL",
  "carbonIntensity": 183,
  "datetime": "2024-05-22T16:00:00.000Z",
  "updatedAt": "2024-05-22T17:47:45.915Z",
  "createdAt": "2024-05-19T16:48:39.043Z",
  "emissionFactorType": "lifecycle",
  "isEstimated": false,
  "estimationMethod": null
}

We could add a flag for this to the exporter?

We could then keep returning the intensity using the current gauge metric. Although we should probably add a label for whether the value is estimated or not.

@mrchrisadams thank you for reaching out to the prometheus folks. I thought a histogram might help but I don't have experience working with those.

locomundo commented 6 months ago

@rossf7 I didn't know about that flag, interesting! It would be the simplest solution with least changes.

The only downside in that case is that we would be reporting the real values shifted one or two hours, unless we also send the timestamp to Prometheus. We are doing some tests on this, it seems we can send the timestamp next to the value, so then it's stored associated with the specific time it refers to (basically using NewMetricWithTimestamp instead of MustNewConstMetric). That seems to work.

If this works, we can apply this in a generic way to all the providers, since adding the timestamp shouldn't hurt in the other cases if it's the current time, right? I'm not very experienced with Prometheus, though, so perhaps you see something I don't.

@mrchrisadams One thing that I thought for having also the estimations available (since they are valuable info), would be to have them either as part of the forecasts. Because they are, after all, a kind of forecast applied to the last couple of hours. Let me know if you think this is a crazy idea ).

Looking at the code, the providers can return multiple data points. Maybe we can integrate the forecasts there, including the estimations.

So in summary, alternative solutions I suggest:

Let me know if any of this makes sense to you @rossf7 @mrchrisadams

rossf7 commented 6 months ago

Hi @locomundo, yes this is not straightforward and its also pushing at my prometheus knowledge but I think what you're proposing makes sense!

it seems we can send the timestamp next to the value, so then it's stored associated with the specific time it refers to (basically using NewMetricWithTimestamp instead of MustNewConstMetric). That seems to work.

If this works, we can apply this in a generic way to all the providers, since adding the timestamp shouldn't hurt in the other cases if it's the current time, right?

Using NewMetricWithTimestamp looks the way to go. https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#NewMetricWithTimestamp

This is only useful in rare cases as the timestamp of a Prometheus metric should usually be set by the Prometheus server during scraping. Exceptions include mirroring metrics with given timestamps from other metric sources.

Since the source of the metrics is always a 3rd party API I think the exception applies to this.

Looking at the code, the providers can return multiple data points. Maybe we can integrate the forecasts there, including the estimations.

Yes, multiple data points are supported. We use this for WattTime as we return the relative carbon intensity which is always present and the actual value if its available as its a paid feature.

My suggestion would be we call the history endpoint and return 2 values with timestamps for the most recent estimate and most recent actual. WDYT? @mrchrisadams Do you think this makes sense?

mrchrisadams commented 6 months ago

Hi Ross

My two cents:

NewMetricWithTimestamp returns a new Metric wrapping the provided Metric in a way that it has an explicit timestamp set to the provided Time.

I didn't know this was supported, and I think this basically gives us what I thought we'd need to do a bunch of backfill faff to achieve. Being able to set it explicitly offers a nice way to support both kinds of values 💯

Yes, multiple data points are supported. We use this for WattTime as we return the relative carbon intensity which is always present and the actual value if its available as its a paid feature.

My suggestion would be we call the history endpoint and return 2 values with timestamps for the most recent estimate and most recent actual. WDYT? @mrchrisadams Do you think this makes sense?

Yes, I think this makes sense. I think this is worth implementing, as having the estimated and actual values is would be valuable.

rossf7 commented 5 months ago

@locomundo Thanks so much for contributing this!

Going to close as your changes are in the 0.7.0 release that just shipped. https://github.com/thegreenwebfoundation/grid-intensity-go/releases/tag/v0.7.0

locomundo commented 5 months ago

Great!! Thank you for handling it so fast! :)