Open adam-narozniak opened 1 year ago
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel.
What happened + What you expected to happen
I have performance issues with running flower's simulation that uses Ray under the hood (https://github.com/adap/flower). This is a machine learning model training in a federated fashion. But what is crucial here is that it's an iterative process that, unfortunately, slows down (spills more and more in 2.2 and earlier versions). There are two options to train the models, one on CPU other on GPU. The nightly release 3.0.0 solved the CPU issue (thanks to https://github.com/ray-project/ray/pull/31488), but it still exists on GPU (well, there are no log messages about the spilling, but the memory usage is high and the whole process lasts way longer compared to the CPU process or while not using Ray).
Versions / Dependencies
2.2. and 3.0.0
Reproduction script
pip install -q flwr[simulation] torch torchvision matplotlib also with ray 3.0.0
code starts below (it's a shortened python version of this tutorial https://colab.research.google.com/github/adap/flower/blob/main/doc/source/tutorial/Flower-1-Intro-to-FL-PyTorch.ipynb)
from collections import OrderedDict from typing import List, Tuple
import flwr as fl import numpy as np import matplotlib.pyplot as plt import torch import torch.nn as nn import torchvision import torch.nn.functional as F import torchvision.transforms as transforms from flwr.common import Metrics from torch.utils.data import DataLoader, random_split from torchvision.datasets import CIFAR10
if name == "main": DEVICE = torch.device("cuda") # Try "cuda" to train on GPU print(f"Training on {DEVICE} using PyTorch {torch.version} and Flower {fl.version}")
Issue Severity
High: It blocks me from completing my task.