ray-project / ray_lightning

Pytorch Lightning Distributed Accelerators using Ray
Apache License 2.0
211 stars 34 forks source link

Support PyTorch Lightning 1.6 #163

Closed JiahaoYao closed 1 year ago

JiahaoYao commented 2 years ago
JiahaoYao commented 2 years ago
JiahaoYao commented 2 years ago

for this CI error, it passed on my machine, but not on the CI.

image

@amogkam

JiahaoYao commented 2 years ago

@sxjscience please use the latest update!

JiahaoYao commented 2 years ago

tune with ddp works for me √

JiahaoYao commented 2 years ago

@amogkam

why in the tune, somestimes it is config/max_epochs? Is that the version issue?

Error in the ci

image

messed up with 'config/max_epochs' and 'config.max_epochs'

this is the code

def tune_test(dir, strategy):
    callbacks = [TuneReportCallback(on="validation_end")]
    analysis = tune.run(
        train_func(dir, strategy, callbacks=callbacks),
        config={"max_epochs": tune.choice([1, 2, 3])},
        resources_per_trial=get_tune_resources(
            num_workers=strategy.num_workers, use_gpu=strategy.use_gpu),
        num_samples=2)
    assert all(analysis.results_df["training_iteration"] ==
               analysis.results_df["config.max_epochs"])

the ci install the latest ray?


  test_linux_ray_master_2:
    runs-on: ubuntu-latest
    timeout-minutes: 40
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python 3.7
        uses: actions/setup-python@v2
        with:
          python-version: 3.7
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          python -m pip install --upgrade setuptools
          python -m pip install codecov
          python -m pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl
          if [ -f requirements-test.txt ]; then python -m pip install -r requirements-test.txt; fi
          HOROVOD_WITH_GLOO=1 HOROVOD_WITHOUT_MPI=1 HOROVOD_WITHOUT_MXNET=1 pip install git+https://github.com/horovod/horovod.git
      - name: Install package
        run: |
          python -m pip install -e .
      - name: Test with Pytest
        run: |
          pushd ray_lightning/tests
          python -m pytest -v --durations=0 -x test_horovod.py
          python -m pytest -v --durations=0 -x test_tune.py
JiahaoYao commented 2 years ago

is that due to this warning?

UserWarning: Dataframes will use '/' instead of '.' to delimit nested result keys in future versions of Ray. For forward compatibility, set the environment variable TUNE_RESULT_DELIM='/'
    "Dataframes will use '/' instead of '.' to delimit "
amogkam commented 2 years ago

yeah seems like this was changed recently. Can you change the test to use config/max_epochs?

JiahaoYao commented 2 years ago

seems like this is true? https://github.com/ray-project/ray/blob/master/python/ray/tune/analysis/experiment_analysis.py#L321-L322

JiahaoYao commented 2 years ago

Sure

JiahaoYao commented 2 years ago

Thanks

JiahaoYao commented 2 years ago
JiahaoYao commented 2 years ago
JiahaoYao commented 2 years ago

the test on the gpu also passed!

JiahaoYao commented 2 years ago

ready for review

JiahaoYao commented 1 year ago

local test on gpu passed

JiahaoYao commented 1 year ago
image image
amogkam commented 1 year ago

Also, let's make sure to follow up on the previous review on adding comments for the following

  1. Which methods are overriding from Pytorch Lightning vs. which methods are brand new ones
  2. Which methods are run remotely vs. which ones are run on the driver.

Can we do this for both the Launchers and the Strategies?