usegalaxy-eu / environmental_impact

Environmental Impact
https://usegalaxy-eu.github.io/environmental_impact/
MIT License
8 stars 1 forks source link

Time-Shifting computation #5

Open bgruening opened 1 year ago

bgruening commented 1 year ago
nsoranzo commented 1 year ago

Can this be simplified if we define a "reduced environmental impact" destination which job/workflows can be assigned to?

AnneHartebrodt commented 1 year ago

Hi guys, Paul suggested I comment on this issue. I started developing an API/service which allows time and location shifting the computation and have some ideas specifically designed for bioinformatics pipelines. Right now we have a prototype implementation of a web API.

The API allows users to request an optimal time frame for a computation given certain parameter (%renewable energy required at run time, estimated run time of the job, hard deadline). We are currently working on extending the prediction horizons, which are not yet great for commercial tools (less than 24h depending on the data source). Based on the time we also have a prediction endpoint for the location.

I'd be extremely happy to get involved and talk about the specific requirements for galaxy.

There are also several similar projects, like electricitymaps.com, who report on the carbon intensity, a score for how 'dirty' the electricity is.

And there is the carbon-aware sdk from microsoft, which has a new version with an API https://github.com/Green-Software-Foundation/carbon-aware-sdk, although they interface electricitymaps (24h) and WattTime for the US.

I think in terms of logic for the scheduling there are several options and it would make sense.

Price is more volatile to predict than percentage of renewable energy.

bgruening commented 1 year ago

@AnneHartebrodt awesome.

I think in terms of logic for the scheduling there are several options and it would make sense.

  • Have a specific queue for green jobs. (that is probably a proper scheduling problem)
  • upscale and downscale the number of available nodes depending on the energy availability.

We could do both. Upscaling and Downscaling might have an impact on the hardware, at least if we really power off the nodes. But we have no experience about the actual impact.

We do have a carbon footprint prediction for jobs (https://galaxyproject.org/news/2023-07-11-carbon-emissions-reporting/) and one of the next steps would be to tie this to scheduling and any input is super valuable for us :)

If you want to visit us we have a small meeting in October: https://galaxyproject.org/events/2023-10-egd/ Otherwise, we can have a small telco :)

bgruening commented 1 year ago

related: https://htcondor.readthedocs.io/en/latest/users-manual/time-scheduling-for-job-execution.html

AnneHartebrodt commented 1 year ago

@bgruening. Excellent. I will join the meeting!

bgruening commented 1 year ago

Great news! See you soon!

kysrpex commented 1 year ago

Hi! I just wanted to let you know that I am organizing a BoF on this topic during the European Galaxy Days. Just get in touch with the organizers or drop me a message or email to get more info when the dates get closer.

sanjaysrikakulam commented 1 year ago

@sebastian-luna-valero shared this github repo in WP4: https://github.com/GreenScheduler/cats

sebastian-luna-valero commented 6 months ago

FYI, another interesting option: https://github.com/mlco2/codecarbon