ray-project / ray_lightning

Pytorch Lightning Distributed Accelerators using Ray
Apache License 2.0
211 stars 34 forks source link

no training starts although flag is running #216

Closed jakubMitura14 closed 2 years ago

jakubMitura14 commented 2 years ago

Hello I havepytorch lightning module running on google cloud vm with 24 cpu cores and 2 A100 vidia gpu. I use ray for hyperparameter tuning.

first the information is about 0 per 1 A100 requested

Resources requested: 16.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)

Then the code for data loading executes ad i get constantly

== Status == Current time: 2022-09-20 19:07:03 (running for 00:07:39.81) Memory usage on this node: 17.7/167.0 GiB PopulationBasedTraining: 0 checkpoints, 0 perturbs Resources requested: 16.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100) Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30 Number of trials: 1/1 (1 RUNNING) +-----------------------+----------+----------------+ | Trial name | status | loc | |-----------------------+----------+----------------| | mainTrain_578ab_00000 | RUNNING | 10.164.0.3:923 | +-----------------------+----------+----------------+

== Status == Current time: 2022-09-20 19:07:08 (running for 00:07:44.83) Memory usage on this node: 17.7/167.0 GiB PopulationBasedTraining: 0 checkpoints, 0 perturbs Resources requested: 16.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100) Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30 Number of trials: 1/1 (1 RUNNING) +-----------------------+----------+----------------+ | Trial name | status | loc | |-----------------------+----------+----------------| | mainTrain_578ab_00000 | RUNNING | 10.164.0.3:923 | +-----------------------+----------+----------------+

tune configuration is defined as below


        config = {
                "lr": 1e-3,
                    "dropout": 0.2,
                    "accumulate_grad_batches":  3,
                    "spacing_keyword":  "_one_spac_c" ,#,"_med_spac_b"

                    "gradient_clip_val": 10.0 ,#{"type": "discrete", "values": [0.0, 0.2,0.5,2.0,100.0]},#,2.0, 0.2,0.5
                    "RandGaussianNoised_prob": 0.01,#{"type": "float", "min": 0.0, "max": 0.5},
                    "RandAdjustContrastd_prob": 0.4,#{"type": "float", "min": 0.3, "max": 0.8},
                    "RandGaussianSmoothd_prob": 0.01,#{"type": "discrete", "values": [0.0]},
                    "RandRicianNoised_prob": 0.4,#{"type": "float", "min": 0.2, "max": 0.7},
                    "RandFlipd_prob": 0.4,#{"type": "float", "min": 0.3, "max": 0.7},
                    "RandAffined_prob": 0.2,#{"type": "float", "min": 0.0, "max": 0.5},
                    "RandCoarseDropoutd_prob": 0.01,# {"type": "discrete", "values": [0.0]},
                    "RandomElasticDeformation_prob": 0.1,#{"type": "float", "min": 0.0, "max": 0.3},
                    "RandomAnisotropy_prob": 0.1,# {"type": "float", "min": 0.0, "max": 0.3},
                    "RandomMotion_prob":  0.1,#{"type": "float", "min": 0.0, "max": 0.3},
                    "RandomGhosting_prob": 0.1,# {"type": "float", "min": 0.0, "max": 0.3},
                    "RandomSpike_prob": 0.1,# {"type": "float", "min": 0.0, "max": 0.3},
                    "RandomBiasField_prob": 0.1,# {"type": "float", "min": 0.0, "max": 0.3},

                }

            pb2_scheduler = PB2(
                    time_attr="training_iteration",
                    metric='avg_val_acc',
                    mode='max',
                    perturbation_interval=10.0,
                    hyperparam_bounds={
                        "lr": [1e-2, 1e-5],
                        "gradient_clip_val": [0.0,100.0] ,#{"type": "discrete", "values": [0.0, 0.2,0.5,2.0,100.0]},#,2.0, 0.2,0.5
                        "RandGaussianNoised_prob": [0.0,1.0],#{"type": "float", "min": 0.0, "max": 0.5},
                        "RandAdjustContrastd_prob": [0.0,1.0],#{"type": "float", "min": 0.3, "max": 0.8},
                        "RandGaussianSmoothd_prob": [0.0,1.0],#{"type": "discrete", "values": [0.0]},
                        "RandRicianNoised_prob": [0.0,1.0],#{"type": "float", "min": 0.2, "max": 0.7},
                        "RandFlipd_prob":[0.0,1.0],#{"type": "float", "min": 0.3, "max": 0.7},
                        "RandAffined_prob": [0.0,1.0],#{"type": "float", "min": 0.0, "max": 0.5},
                        "RandCoarseDropoutd_prob": [0.0,1.0],# {"type": "discrete", "values": [0.0]},
                        "RandomElasticDeformation_prob":[0.0,1.0],#{"type": "float", "min": 0.0, "max": 0.3},
                        "RandomAnisotropy_prob": [0.0,1.0],# {"type": "float", "min": 0.0, "max": 0.3},
                        "RandomMotion_prob":  [0.0,1.0],#{"type": "float", "min": 0.0, "max": 0.3},
                        "RandomGhosting_prob":[0.0,1.0],# {"type": "float", "min": 0.0, "max": 0.3},
                        "RandomSpike_prob": [0.0,1.0],# {"type": "float", "min": 0.0, "max": 0.3},
                        "RandomBiasField_prob": [0.0,1.0],# {"type": "float", "min": 0.0, "max": 0.3},
                        "dropout": [0.0,0.6],# {"type": "float", "min": 0.0, "max": 0.3},
                    })

            experiment_name="picai-hyperparam-search-30"
            # Three_chan_baseline.mainTrain(options,df,experiment_name,dummyDict)
            num_gpu=2
            cpu_num=8 #per gpu
            default_root_dir='/home/sliceruser/data/lightning'
            checkpoint_dir='/home/sliceruser/data/tuneCheckpoints1'
            num_cpus_per_worker=cpu_num

            tuner = tune.Tuner(
                tune.with_resources(
                    tune.with_parameters(
                        Three_chan_baseline.mainTrain,
                        df=df,
                        experiment_name=experiment_name
                        ,dummyDict=dummyDict
                        ,num_gpu=num_gpu
                        ,cpu_num=cpu_num
                         ,default_root_dir=default_root_dir
                         ,checkpoint_dir=checkpoint_dir
                         ,options=options
                         ,num_cpus_per_worker=num_cpus_per_worker            
                        ),
                    resources=tune.PlacementGroupFactory(
                            [{'CPU': num_cpus_per_worker, 'GPU': 1.0}] + [{'CPU': num_cpus_per_worker, 'GPU': 1.0}]
                        )
                ),
                tune_config=tune.TuneConfig(
                    # metric="avg_val_acc",
                    # mode="max",
                    scheduler=pb2_scheduler,
                    #num_samples=1#num_gpu,
                ),
                run_config=air.RunConfig(
                    name=experiment_name,
                    # progress_reporter=reporter,
                ),
                param_space=config,
                #reuse_actors=True
            )
            results = tuner.fit()

pytorch lightning trainer is defined as

  checkPointCallback=TuneReportCheckpointCallback(
        metrics={
            "loss": "avg_val_loss",
            "mean_accuracy": "avg_val_acc"
        },
        filename="checkpoint",
        on="validation_end")

    strategy = RayShardedStrategy(num_workers=num_gpu, num_cpus_per_worker=num_cpus_per_worker, use_gpu=True)

    callbacks=[checkPointCallback]
    kwargs = {
        #"accelerator":'auto',
        "max_epochs": max_epochs,
        "callbacks" :callbacks,
        "logger" : comet_logger,
        "default_root_dir" : default_root_dir,
        "auto_lr_find" : False,
        "check_val_every_n_epoch" : 10,
        "accumulate_grad_batches" : accumulate_grad_batches,
        "gradient_clip_val" :gradient_clip_val,
        "log_every_n_steps" :2,
        "strategy" :strategy
        }

    if checkpoint_dir:
        kwargs["resume_from_checkpoint"] = os.path.join(
            checkpoint_dir, "checkpoint")

    trainer = pl.Trainer(**kwargs)

however no gpu is used below nvidia-smi result

![image](https://user-images.githubusercontent.com/53857487/191343357-8b146055-b657-4bac-b75b-2468d1055fc8.png)

full docker container definition available at https://github.com/jakubMitura14/forPicaiDocker full code (quite long) available at https://github.com/jakubMitura14/piCaiCode

amogkam commented 2 years ago

Hey @jakubMitura14, are you also seeing messages about Actor not able to be scheduled? It would help if you can also share the full stdout.

To use Ray Lightning with Tune you should use the get_tune_resources utility function (like in this example: https://github.com/ray-project/ray_lightning#hyperparameter-tuning-with-ray-tune) rather than passing in the PlacementGroupFactory yourself. For your case it would be get_tune_resources(num_workers=2, use_gpu=True). With your current code, there is not enough resources available to schedule all the actors so the hanging is expected.

jakubMitura14 commented 2 years ago

Thank you for the response I had changed requested resources as you suggested and removed ray sharded strategy still problem persist

jakubmituraaa@vm-nvidia-gpu-v100-vm2-vm:~$ sudo docker run --init --gpus all --ipc host --privileged --net host -p 8888:8888 -p49053:49053 -v /mnt/disks/sde:/home/sliceruser/data -it  slicerpicai:latest
WARNING: Published ports are discarded when using host network mode
staaarting 
remote: Enumerating objects: 8, done.
remote: Counting objects: 100% (8/8), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 4), reused 4 (delta 2), pack-reused 0
Unpacking objects: 100% (6/6), 558 bytes | 186.00 KiB/s, done.
From https://github.com/jakubMitura14/piCaiCode
   958d152..36bab58  main       -> origin/main
Updating 958d152..36bab58
Fast-forward
 ThreeChanNoExperiment.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-0twadvgt because the default path (/home/sliceruser/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
[KeOps] Warning : Cuda libraries were not detected on the system ; using cpu only mode
warning in stationary: failed to import cython module: falling back to numpy
warning in choleskies: failed to import cython module: falling back to numpy
aaa
If you have questions or suggestions, feel free to open an issue at https://github.com/DIAGNijmegen/picai_eval

Please cite the following paper when using Report Guided Annotations:

Bosma, J.S., et al. "Semi-supervised learning with report-guided lesion annotation for deep learning-based prostate cancer detection in bpMRI" to be submitted

If you have questions or suggestions, feel free to open an issue at https://github.com/DIAGNijmegen/Report-Guided-Annotation

sizz (192, 160, 96) 
sizz (64, 320, 176) 
2022-09-21 17:07:15,515 - Initializing Ray automatically.For cluster usage or custom Ray initialization, call `ray.init(...)` before `tune.run`.
2022-09-21 17:07:17,997 INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
 /usr/local/lib/python3.8/dist-packages/ray/tune/trainable/function_trainable.py:642: DeprecationWarning:`checkpoint_dir` in `func(config, checkpoint_dir)` is being deprecated. To save and load checkpoint in trainable functions, please use the `ray.air.session` API:

from ray.air import session

def train(config):
    # ...
    session.report({"metric": metric}, checkpoint=checkpoint)

For more information please see https://docs.ray.io/en/master/ray-air/key-concepts.html#session

2022-09-21 17:07:19,083 WARNING trial_runner.py:1575 -- You are trying to access _search_alg interface of TrialRunner in TrialScheduler, which is being restricted. If you believe it is reasonable for your scheduler to access this TrialRunner API, please reach out to Ray team on GitHub. A more strict API access pattern would be enforced starting 1.12s.0
(pid=924) If you have questions or suggestions, feel free to open an issue at https://github.com/DIAGNijmegen/picai_eval
(pid=924) 
(pid=924) 
(pid=924) 
(pid=924) Please cite the following paper when using Report Guided Annotations:
(pid=924) 
(pid=924) Bosma, J.S., et al. "Semi-supervised learning with report-guided lesion annotation for deep learning-based prostate cancer detection in bpMRI" to be submitted
(pid=924) 
(pid=924) 
(pid=924) If you have questions or suggestions, feel free to open an issue at https://github.com/DIAGNijmegen/Report-Guided-Annotation
(pid=924) 
== Status ==
Current time: 2022-09-21 17:07:26 (running for 00:00:07.78)
Memory usage on this node: 4.5/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

(mainTrain pid=924) aaaaaa  img_size (192, 160, 96)  <class 'tuple'>
(mainTrain pid=924) aaaaaaaaaaaaaaaaaaa dropout 0.2
(mainTrain pid=924) self.allSubjects 100  self.onlyPositiveSubjects 100
(mainTrain pid=924) Train data set: 90
(mainTrain pid=924) Test data set: 5
(mainTrain pid=924) Valid data set: 5
(mainTrain pid=924) Train data set: 90
(mainTrain pid=924) Test data set: 5
(mainTrain pid=924) Valid data set: 5
(mainTrain pid=924) 
(mainTrain pid=924) A value is trying to be set on a copy of a slice from a DataFrame.
(mainTrain pid=924) Try using .loc[row_indexer,col_indexer] = value instead
(mainTrain pid=924) 
(mainTrain pid=924) See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
(mainTrain pid=924) CometLogger will be initialized in online mode
Loading dataset:   0%|          | 0/90 [00:00<?, ?it/s]
Loading dataset:   1%|          | 1/90 [00:01<02:40,  1.81s/it]
Loading dataset:   2%|▏         | 2/90 [00:02<01:18,  1.12it/s]
Loading dataset:   3%|▎         | 3/90 [00:02<00:52,  1.67it/s]
Loading dataset:   4%|▍         | 4/90 [00:02<00:40,  2.11it/s]
Loading dataset:   6%|▌         | 5/90 [00:02<00:33,  2.57it/s]
Loading dataset:   7%|▋         | 6/90 [00:03<00:28,  2.99it/s]
Loading dataset:   8%|▊         | 7/90 [00:03<00:25,  3.23it/s]
Loading dataset:   9%|▉         | 8/90 [00:03<00:22,  3.60it/s]
Loading dataset:  10%|█         | 9/90 [00:03<00:21,  3.80it/s]
Loading dataset:  11%|█         | 10/90 [00:04<00:21,  3.78it/s]
Loading dataset:  12%|█▏        | 11/90 [00:04<00:19,  4.06it/s]
== Status ==
Current time: 2022-09-21 17:07:31 (running for 00:00:12.78)
Memory usage on this node: 5.5/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  13%|█▎        | 12/90 [00:04<00:20,  3.78it/s]
Loading dataset:  14%|█▍        | 13/90 [00:04<00:19,  3.92it/s]
Loading dataset:  16%|█▌        | 14/90 [00:05<00:18,  4.01it/s]
Loading dataset:  17%|█▋        | 15/90 [00:05<00:17,  4.24it/s]
Loading dataset:  18%|█▊        | 16/90 [00:05<00:17,  4.29it/s]
Loading dataset:  19%|█▉        | 17/90 [00:05<00:17,  4.18it/s]
Loading dataset:  20%|██        | 18/90 [00:05<00:16,  4.24it/s]
Loading dataset:  21%|██        | 19/90 [00:06<00:17,  4.13it/s]
Loading dataset:  22%|██▏       | 20/90 [00:06<00:16,  4.36it/s]
Loading dataset:  23%|██▎       | 21/90 [00:06<00:16,  4.17it/s]
Loading dataset:  24%|██▍       | 22/90 [00:06<00:15,  4.35it/s]
Loading dataset:  26%|██▌       | 23/90 [00:07<00:15,  4.23it/s]
Loading dataset:  27%|██▋       | 24/90 [00:07<00:14,  4.44it/s]
Loading dataset:  28%|██▊       | 25/90 [00:07<00:15,  4.22it/s]
Loading dataset:  29%|██▉       | 26/90 [00:07<00:15,  4.11it/s]
Loading dataset:  30%|███       | 27/90 [00:08<00:14,  4.30it/s]
Loading dataset:  31%|███       | 28/90 [00:08<00:14,  4.15it/s]
Loading dataset:  32%|███▏      | 29/90 [00:08<00:15,  4.05it/s]
Loading dataset:  33%|███▎      | 30/90 [00:08<00:14,  4.15it/s]
Loading dataset:  34%|███▍      | 31/90 [00:08<00:13,  4.36it/s]
Loading dataset:  36%|███▌      | 32/90 [00:09<00:13,  4.36it/s]
== Status ==
Current time: 2022-09-21 17:07:36 (running for 00:00:17.78)
Memory usage on this node: 7.0/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  37%|███▋      | 33/90 [00:09<00:14,  4.07it/s]
Loading dataset:  38%|███▊      | 34/90 [00:09<00:13,  4.28it/s]
Loading dataset:  39%|███▉      | 35/90 [00:09<00:13,  4.19it/s]
Loading dataset:  40%|████      | 36/90 [00:10<00:12,  4.39it/s]
Loading dataset:  41%|████      | 37/90 [00:10<00:11,  4.42it/s]
Loading dataset:  42%|████▏     | 38/90 [00:10<00:11,  4.45it/s]
Loading dataset:  43%|████▎     | 39/90 [00:10<00:11,  4.43it/s]
Loading dataset:  44%|████▍     | 40/90 [00:11<00:12,  3.96it/s]
Loading dataset:  46%|████▌     | 41/90 [00:11<00:11,  4.18it/s]
Loading dataset:  47%|████▋     | 42/90 [00:11<00:11,  4.07it/s]
Loading dataset:  48%|████▊     | 43/90 [00:11<00:11,  4.15it/s]
Loading dataset:  49%|████▉     | 44/90 [00:12<00:11,  3.97it/s]
Loading dataset:  50%|█████     | 45/90 [00:12<00:10,  4.09it/s]
Loading dataset:  51%|█████     | 46/90 [00:12<00:11,  3.75it/s]
Loading dataset:  52%|█████▏    | 47/90 [00:12<00:11,  3.63it/s]
Loading dataset:  53%|█████▎    | 48/90 [00:13<00:10,  3.92it/s]
Loading dataset:  54%|█████▍    | 49/90 [00:13<00:10,  3.94it/s]
Loading dataset:  56%|█████▌    | 50/90 [00:13<00:09,  4.06it/s]
Loading dataset:  57%|█████▋    | 51/90 [00:13<00:09,  4.01it/s]
Loading dataset:  58%|█████▊    | 52/90 [00:14<00:08,  4.24it/s]
Loading dataset:  59%|█████▉    | 53/90 [00:14<00:09,  3.87it/s]
== Status ==
Current time: 2022-09-21 17:07:41 (running for 00:00:22.79)
Memory usage on this node: 8.3/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  60%|██████    | 54/90 [00:14<00:09,  3.85it/s]
Loading dataset:  61%|██████    | 55/90 [00:14<00:09,  3.78it/s]
Loading dataset:  62%|██████▏   | 56/90 [00:15<00:08,  3.92it/s]
Loading dataset:  63%|██████▎   | 57/90 [00:15<00:08,  3.91it/s]
Loading dataset:  64%|██████▍   | 58/90 [00:15<00:07,  4.03it/s]
Loading dataset:  66%|██████▌   | 59/90 [00:15<00:07,  4.13it/s]
Loading dataset:  67%|██████▋   | 60/90 [00:16<00:07,  4.01it/s]
Loading dataset:  68%|██████▊   | 61/90 [00:16<00:07,  3.79it/s]
Loading dataset:  69%|██████▉   | 62/90 [00:16<00:06,  4.09it/s]
Loading dataset:  70%|███████   | 63/90 [00:16<00:06,  4.09it/s]
Loading dataset:  71%|███████   | 64/90 [00:17<00:06,  4.07it/s]
Loading dataset:  72%|███████▏  | 65/90 [00:17<00:05,  4.24it/s]
Loading dataset:  73%|███████▎  | 66/90 [00:17<00:05,  4.16it/s]
Loading dataset:  74%|███████▍  | 67/90 [00:17<00:05,  4.05it/s]
Loading dataset:  76%|███████▌  | 68/90 [00:18<00:05,  4.01it/s]
Loading dataset:  77%|███████▋  | 69/90 [00:18<00:04,  4.27it/s]
Loading dataset:  78%|███████▊  | 70/90 [00:18<00:04,  4.18it/s]
Loading dataset:  79%|███████▉  | 71/90 [00:18<00:04,  4.14it/s]
Loading dataset:  80%|████████  | 72/90 [00:19<00:04,  4.22it/s]
Loading dataset:  81%|████████  | 73/90 [00:19<00:04,  3.88it/s]
== Status ==
Current time: 2022-09-21 17:07:46 (running for 00:00:27.79)
Memory usage on this node: 9.7/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  82%|████████▏ | 74/90 [00:19<00:04,  3.98it/s]
Loading dataset:  83%|████████▎ | 75/90 [00:19<00:03,  3.96it/s]
Loading dataset:  84%|████████▍ | 76/90 [00:20<00:03,  4.10it/s]
Loading dataset:  86%|████████▌ | 77/90 [00:20<00:03,  3.76it/s]
Loading dataset:  87%|████████▋ | 78/90 [00:20<00:03,  3.85it/s]
Loading dataset:  88%|████████▊ | 79/90 [00:20<00:02,  4.01it/s]
Loading dataset:  89%|████████▉ | 80/90 [00:21<00:02,  3.98it/s]
Loading dataset:  90%|█████████ | 81/90 [00:21<00:02,  4.11it/s]
Loading dataset:  91%|█████████ | 82/90 [00:21<00:02,  3.96it/s]
Loading dataset:  92%|█████████▏| 83/90 [00:21<00:01,  4.16it/s]
Loading dataset:  93%|█████████▎| 84/90 [00:22<00:01,  4.24it/s]
Loading dataset:  94%|█████████▍| 85/90 [00:22<00:01,  4.08it/s]
Loading dataset:  96%|█████████▌| 86/90 [00:22<00:00,  4.16it/s]
Loading dataset:  97%|█████████▋| 87/90 [00:22<00:00,  4.23it/s]
Loading dataset:  98%|█████████▊| 88/90 [00:23<00:00,  4.23it/s]
Loading dataset:  99%|█████████▉| 89/90 [00:23<00:00,  4.29it/s]
Loading dataset: 100%|██████████| 90/90 [00:23<00:00,  3.83it/s]
Loading dataset:   0%|          | 0/10 [00:00<?, ?it/s]
Loading dataset:  10%|█         | 1/10 [00:00<00:01,  4.71it/s]
Loading dataset:  20%|██        | 2/10 [00:00<00:02,  3.93it/s]
Loading dataset:  30%|███       | 3/10 [00:00<00:01,  4.25it/s]
== Status ==
Current time: 2022-09-21 17:07:51 (running for 00:00:32.79)
Memory usage on this node: 11.0/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  40%|████      | 4/10 [00:00<00:01,  4.13it/s]
Loading dataset:  50%|█████     | 5/10 [00:01<00:01,  3.94it/s]
Loading dataset:  60%|██████    | 6/10 [00:01<00:00,  4.11it/s]
Loading dataset:  70%|███████   | 7/10 [00:01<00:00,  4.36it/s]
Loading dataset:  80%|████████  | 8/10 [00:01<00:00,  4.37it/s]
Loading dataset:  90%|█████████ | 9/10 [00:02<00:00,  4.32it/s]
Loading dataset: 100%|██████████| 10/10 [00:02<00:00,  4.19it/s]
Loading dataset:   0%|          | 0/90 [00:00<?, ?it/s]
Loading dataset:   1%|          | 1/90 [00:00<00:22,  3.93it/s]
Loading dataset:   2%|▏         | 2/90 [00:00<00:19,  4.40it/s]
Loading dataset:   3%|▎         | 3/90 [00:00<00:21,  4.10it/s]
Loading dataset:   4%|▍         | 4/90 [00:00<00:20,  4.14it/s]
Loading dataset:   6%|▌         | 5/90 [00:01<00:21,  4.03it/s]
Loading dataset:   7%|▋         | 6/90 [00:01<00:19,  4.30it/s]
Loading dataset:   8%|▊         | 7/90 [00:01<00:19,  4.30it/s]
Loading dataset:   9%|▉         | 8/90 [00:01<00:19,  4.16it/s]
Loading dataset:  10%|█         | 9/90 [00:02<00:18,  4.40it/s]
Loading dataset:  11%|█         | 10/90 [00:02<00:19,  4.00it/s]
Loading dataset:  12%|█▏        | 11/90 [00:02<00:18,  4.27it/s]
Loading dataset:  13%|█▎        | 12/90 [00:02<00:18,  4.20it/s]
Loading dataset:  14%|█▍        | 13/90 [00:03<00:18,  4.08it/s]
Loading dataset:  16%|█▌        | 14/90 [00:03<00:17,  4.28it/s]
== Status ==
Current time: 2022-09-21 17:07:56 (running for 00:00:37.79)
Memory usage on this node: 12.4/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  17%|█▋        | 15/90 [00:03<00:17,  4.17it/s]
Loading dataset:  18%|█▊        | 16/90 [00:03<00:16,  4.39it/s]
Loading dataset:  19%|█▉        | 17/90 [00:04<00:17,  4.25it/s]
Loading dataset:  20%|██        | 18/90 [00:04<00:16,  4.27it/s]
Loading dataset:  21%|██        | 19/90 [00:04<00:15,  4.48it/s]
Loading dataset:  22%|██▏       | 20/90 [00:04<00:16,  4.32it/s]
Loading dataset:  23%|██▎       | 21/90 [00:04<00:15,  4.48it/s]
Loading dataset:  24%|██▍       | 22/90 [00:05<00:15,  4.32it/s]
Loading dataset:  26%|██▌       | 23/90 [00:05<00:14,  4.52it/s]
Loading dataset:  27%|██▋       | 24/90 [00:05<00:15,  4.34it/s]
Loading dataset:  28%|██▊       | 25/90 [00:05<00:15,  4.32it/s]
Loading dataset:  29%|██▉       | 26/90 [00:06<00:14,  4.34it/s]
Loading dataset:  30%|███       | 27/90 [00:06<00:14,  4.41it/s]
Loading dataset:  31%|███       | 28/90 [00:06<00:13,  4.45it/s]
Loading dataset:  32%|███▏      | 29/90 [00:06<00:13,  4.48it/s]
Loading dataset:  33%|███▎      | 30/90 [00:06<00:13,  4.48it/s]
Loading dataset:  34%|███▍      | 31/90 [00:07<00:13,  4.35it/s]
Loading dataset:  36%|███▌      | 32/90 [00:07<00:12,  4.58it/s]
Loading dataset:  37%|███▋      | 33/90 [00:07<00:12,  4.53it/s]
Loading dataset:  38%|███▊      | 34/90 [00:07<00:12,  4.52it/s]
Loading dataset:  39%|███▉      | 35/90 [00:08<00:12,  4.52it/s]
Loading dataset:  40%|████      | 36/90 [00:08<00:11,  4.51it/s]
== Status ==
Current time: 2022-09-21 17:08:01 (running for 00:00:42.80)
Memory usage on this node: 13.9/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  41%|████      | 37/90 [00:08<00:12,  4.18it/s]
Loading dataset:  42%|████▏     | 38/90 [00:08<00:12,  4.24it/s]
Loading dataset:  43%|████▎     | 39/90 [00:08<00:11,  4.46it/s]
Loading dataset:  44%|████▍     | 40/90 [00:09<00:11,  4.47it/s]
Loading dataset:  46%|████▌     | 41/90 [00:09<00:11,  4.42it/s]
Loading dataset:  47%|████▋     | 42/90 [00:09<00:11,  4.29it/s]
Loading dataset:  48%|████▊     | 43/90 [00:09<00:10,  4.51it/s]
Loading dataset:  49%|████▉     | 44/90 [00:10<00:10,  4.32it/s]
Loading dataset:  50%|█████     | 45/90 [00:10<00:10,  4.34it/s]
Loading dataset:  51%|█████     | 46/90 [00:10<00:09,  4.53it/s]
Loading dataset:  52%|█████▏    | 47/90 [00:10<00:09,  4.35it/s]
Loading dataset:  53%|█████▎    | 48/90 [00:11<00:09,  4.54it/s]
Loading dataset:  54%|█████▍    | 49/90 [00:11<00:09,  4.32it/s]
Loading dataset:  56%|█████▌    | 50/90 [00:11<00:09,  4.42it/s]
Loading dataset:  57%|█████▋    | 51/90 [00:11<00:08,  4.41it/s]
Loading dataset:  58%|█████▊    | 52/90 [00:11<00:08,  4.26it/s]
Loading dataset:  59%|█████▉    | 53/90 [00:12<00:08,  4.43it/s]
Loading dataset:  60%|██████    | 54/90 [00:12<00:08,  4.28it/s]
Loading dataset:  61%|██████    | 55/90 [00:12<00:07,  4.49it/s]
Loading dataset:  62%|██████▏   | 56/90 [00:12<00:07,  4.48it/s]
Loading dataset:  63%|██████▎   | 57/90 [00:13<00:07,  4.33it/s]
Loading dataset:  64%|██████▍   | 58/90 [00:13<00:07,  4.53it/s]
== Status ==
Current time: 2022-09-21 17:08:06 (running for 00:00:47.80)
Memory usage on this node: 15.4/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  66%|██████▌   | 59/90 [00:13<00:06,  4.48it/s]
Loading dataset:  67%|██████▋   | 60/90 [00:13<00:07,  4.06it/s]
Loading dataset:  68%|██████▊   | 61/90 [00:14<00:07,  3.83it/s]
Loading dataset:  69%|██████▉   | 62/90 [00:14<00:07,  3.61it/s]
Loading dataset:  70%|███████   | 63/90 [00:14<00:07,  3.47it/s]
Loading dataset:  71%|███████   | 64/90 [00:15<00:07,  3.30it/s]
Loading dataset:  72%|███████▏  | 65/90 [00:15<00:07,  3.37it/s]
Loading dataset:  73%|███████▎  | 66/90 [00:15<00:07,  3.35it/s]
Loading dataset:  74%|███████▍  | 67/90 [00:16<00:07,  3.19it/s]
Loading dataset:  76%|███████▌  | 68/90 [00:16<00:06,  3.40it/s]
Loading dataset:  77%|███████▋  | 69/90 [00:16<00:05,  3.61it/s]
Loading dataset:  78%|███████▊  | 70/90 [00:16<00:05,  3.70it/s]
Loading dataset:  79%|███████▉  | 71/90 [00:16<00:04,  4.01it/s]
Loading dataset:  80%|████████  | 72/90 [00:17<00:04,  4.09it/s]
Loading dataset:  81%|████████  | 73/90 [00:17<00:04,  4.19it/s]
Loading dataset:  82%|████████▏ | 74/90 [00:17<00:03,  4.25it/s]
Loading dataset:  83%|████████▎ | 75/90 [00:17<00:03,  4.13it/s]
Loading dataset:  84%|████████▍ | 76/90 [00:18<00:03,  4.34it/s]
Loading dataset:  86%|████████▌ | 77/90 [00:18<00:03,  4.20it/s]
== Status ==
Current time: 2022-09-21 17:08:11 (running for 00:00:52.80)
Memory usage on this node: 16.6/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  87%|████████▋ | 78/90 [00:18<00:02,  4.32it/s]
Loading dataset:  88%|████████▊ | 79/90 [00:18<00:02,  4.21it/s]
Loading dataset:  89%|████████▉ | 80/90 [00:19<00:02,  4.43it/s]
Loading dataset:  90%|█████████ | 81/90 [00:19<00:02,  4.28it/s]
Loading dataset:  91%|█████████ | 82/90 [00:19<00:01,  4.44it/s]
Loading dataset:  92%|█████████▏| 83/90 [00:19<00:01,  4.43it/s]
Loading dataset:  93%|█████████▎| 84/90 [00:19<00:01,  4.43it/s]
Loading dataset:  94%|█████████▍| 85/90 [00:20<00:01,  4.45it/s]
Loading dataset:  96%|█████████▌| 86/90 [00:20<00:00,  4.30it/s]
Loading dataset:  97%|█████████▋| 87/90 [00:20<00:00,  4.45it/s]
Loading dataset:  98%|█████████▊| 88/90 [00:20<00:00,  4.41it/s]
Loading dataset:  99%|█████████▉| 89/90 [00:21<00:00,  4.29it/s]
(mainTrain pid=924) 2022-09-21 17:08:14,625 - Created a temporary directory at /tmp/tmp_xar33m4
(mainTrain pid=924) 2022-09-21 17:08:14,625 - Writing /tmp/tmp_xar33m4/_remote_module_non_scriptable.py
(mainTrain pid=924) Training started at 2022-09-21 17:08:14.667090
Loading dataset: 100%|██████████| 90/90 [00:21<00:00,  4.22it/s]
(mainTrain pid=924) GPU available: False, used: False
(mainTrain pid=924) TPU available: False, using: 0 TPU cores
(mainTrain pid=924) IPU available: False, using: 0 IPUs
(mainTrain pid=924) HPU available: False, using: 0 HPUs
(mainTrain pid=924) The `on_keyboard_interrupt` callback hook was deprecated in v1.5 and will be removed in v1.7. Please use the `on_exception` callback hook instead.
(mainTrain pid=924) The `on_init_start` callback hook was deprecated in v1.6 and will be removed in v1.8.
(mainTrain pid=924) The `on_init_end` callback hook was deprecated in v1.6 and will be removed in v1.8.
(mainTrain pid=924) The `Callback.on_batch_start` hook was deprecated in v1.6 and will be removed in v1.8. Please use `Callback.on_train_batch_start` instead.
(mainTrain pid=924) The `Callback.on_batch_end` hook was deprecated in v1.6 and will be removed in v1.8. Please use `Callback.on_train_batch_end` instead.
(mainTrain pid=924) The `Callback.on_epoch_start` hook was deprecated in v1.6 and will be removed in v1.8. Please use `Callback.on_<train/validation/test>_epoch_start` instead.
(mainTrain pid=924) The `Callback.on_epoch_end` hook was deprecated in v1.6 and will be removed in v1.8. Please use `Callback.on_<train/validation/test>_epoch_end` instead.
(mainTrain pid=924) self.allSubjects 100  self.onlyPositiveSubjects 100
(mainTrain pid=924) Train data set: 90
(mainTrain pid=924) Test data set: 5
(mainTrain pid=924) Valid data set: 5
(mainTrain pid=924) Train data set: 90
(mainTrain pid=924) Test data set: 5
(mainTrain pid=924) Valid data set: 5
Loading dataset:   0%|          | 0/90 [00:00<?, ?it/s]
Loading dataset:   1%|          | 1/90 [00:00<00:19,  4.62it/s]
Loading dataset:   2%|▏         | 2/90 [00:00<00:20,  4.20it/s]
Loading dataset:   3%|▎         | 3/90 [00:00<00:22,  3.93it/s]
Loading dataset:   4%|▍         | 4/90 [00:00<00:21,  4.05it/s]
Loading dataset:   6%|▌         | 5/90 [00:01<00:20,  4.15it/s]
Loading dataset:   7%|▋         | 6/90 [00:01<00:19,  4.20it/s]
Loading dataset:   8%|▊         | 7/90 [00:01<00:19,  4.27it/s]
Loading dataset:   9%|▉         | 8/90 [00:01<00:19,  4.31it/s]
== Status ==
Current time: 2022-09-21 17:08:16 (running for 00:00:57.80)
Memory usage on this node: 18.0/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  10%|█         | 9/90 [00:02<00:19,  4.25it/s]
Loading dataset:  11%|█         | 10/90 [00:02<00:18,  4.31it/s]
Loading dataset:  12%|█▏        | 11/90 [00:02<00:19,  4.16it/s]
Loading dataset:  13%|█▎        | 12/90 [00:02<00:18,  4.12it/s]
Loading dataset:  14%|█▍        | 13/90 [00:03<00:18,  4.05it/s]
Loading dataset:  16%|█▌        | 14/90 [00:03<00:17,  4.24it/s]
Loading dataset:  17%|█▋        | 15/90 [00:03<00:18,  4.10it/s]
Loading dataset:  18%|█▊        | 16/90 [00:03<00:17,  4.28it/s]
Loading dataset:  19%|█▉        | 17/90 [00:04<00:17,  4.16it/s]
Loading dataset:  20%|██        | 18/90 [00:04<00:16,  4.28it/s]
Loading dataset:  21%|██        | 19/90 [00:04<00:16,  4.20it/s]
Loading dataset:  22%|██▏       | 20/90 [00:04<00:17,  4.08it/s]
Loading dataset:  23%|██▎       | 21/90 [00:05<00:16,  4.13it/s]
Loading dataset:  24%|██▍       | 22/90 [00:05<00:15,  4.28it/s]
Loading dataset:  26%|██▌       | 23/90 [00:05<00:16,  4.15it/s]
Loading dataset:  27%|██▋       | 24/90 [00:05<00:15,  4.37it/s]
Loading dataset:  28%|██▊       | 25/90 [00:05<00:15,  4.23it/s]
Loading dataset:  29%|██▉       | 26/90 [00:06<00:14,  4.27it/s]
Loading dataset:  30%|███       | 27/90 [00:06<00:14,  4.45it/s]
Loading dataset:  31%|███       | 28/90 [00:06<00:14,  4.22it/s]
Loading dataset:  32%|███▏      | 29/90 [00:06<00:13,  4.42it/s]
== Status ==
Current time: 2022-09-21 17:08:21 (running for 00:01:02.81)
Memory usage on this node: 19.4/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  33%|███▎      | 30/90 [00:07<00:14,  4.25it/s]
Loading dataset:  34%|███▍      | 31/90 [00:07<00:13,  4.31it/s]
Loading dataset:  36%|███▌      | 32/90 [00:07<00:13,  4.33it/s]
Loading dataset:  37%|███▋      | 33/90 [00:07<00:13,  4.22it/s]
Loading dataset:  38%|███▊      | 34/90 [00:08<00:13,  4.24it/s]
Loading dataset:  39%|███▉      | 35/90 [00:08<00:12,  4.37it/s]
Loading dataset:  40%|████      | 36/90 [00:08<00:12,  4.24it/s]
Loading dataset:  41%|████      | 37/90 [00:08<00:11,  4.45it/s]
Loading dataset:  42%|████▏     | 38/90 [00:08<00:11,  4.41it/s]
Loading dataset:  43%|████▎     | 39/90 [00:09<00:11,  4.43it/s]
Loading dataset:  44%|████▍     | 40/90 [00:09<00:12,  4.12it/s]
Loading dataset:  46%|████▌     | 41/90 [00:09<00:11,  4.17it/s]
Loading dataset:  47%|████▋     | 42/90 [00:09<00:11,  4.04it/s]
Loading dataset:  48%|████▊     | 43/90 [00:10<00:11,  4.07it/s]
Loading dataset:  49%|████▉     | 44/90 [00:10<00:11,  3.83it/s]
Loading dataset:  50%|█████     | 45/90 [00:10<00:11,  4.08it/s]
Loading dataset:  51%|█████     | 46/90 [00:10<00:11,  3.94it/s]
Loading dataset:  52%|█████▏    | 47/90 [00:11<00:11,  3.70it/s]
Loading dataset:  53%|█████▎    | 48/90 [00:11<00:10,  3.96it/s]
Loading dataset:  54%|█████▍    | 49/90 [00:11<00:10,  3.95it/s]
Loading dataset:  56%|█████▌    | 50/90 [00:11<00:10,  3.98it/s]
== Status ==
Current time: 2022-09-21 17:08:26 (running for 00:01:07.81)
Memory usage on this node: 20.7/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  57%|█████▋    | 51/90 [00:12<00:09,  4.03it/s]
Loading dataset:  58%|█████▊    | 52/90 [00:12<00:09,  4.20it/s]
Loading dataset:  59%|█████▉    | 53/90 [00:12<00:09,  4.00it/s]
Loading dataset:  60%|██████    | 54/90 [00:13<00:10,  3.60it/s]
Loading dataset:  61%|██████    | 55/90 [00:13<00:09,  3.59it/s]
Loading dataset:  62%|██████▏   | 56/90 [00:13<00:09,  3.77it/s]
Loading dataset:  63%|██████▎   | 57/90 [00:13<00:08,  3.82it/s]
Loading dataset:  64%|██████▍   | 58/90 [00:14<00:07,  4.06it/s]
Loading dataset:  66%|██████▌   | 59/90 [00:14<00:07,  4.15it/s]
Loading dataset:  67%|██████▋   | 60/90 [00:14<00:07,  3.86it/s]
Loading dataset:  68%|██████▊   | 61/90 [00:14<00:07,  3.84it/s]
Loading dataset:  69%|██████▉   | 62/90 [00:15<00:06,  4.01it/s]
Loading dataset:  70%|███████   | 63/90 [00:15<00:06,  3.93it/s]
Loading dataset:  71%|███████   | 64/90 [00:15<00:06,  4.21it/s]
Loading dataset:  72%|███████▏  | 65/90 [00:15<00:06,  3.78it/s]
Loading dataset:  73%|███████▎  | 66/90 [00:16<00:05,  4.00it/s]
Loading dataset:  74%|███████▍  | 67/90 [00:16<00:05,  3.86it/s]
Loading dataset:  76%|███████▌  | 68/90 [00:16<00:05,  3.99it/s]
Loading dataset:  77%|███████▋  | 69/90 [00:16<00:05,  4.10it/s]
Loading dataset:  78%|███████▊  | 70/90 [00:17<00:04,  4.08it/s]
== Status ==
Current time: 2022-09-21 17:08:31 (running for 00:01:12.81)
Memory usage on this node: 22.1/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  79%|███████▉  | 71/90 [00:17<00:04,  4.14it/s]
Loading dataset:  80%|████████  | 72/90 [00:17<00:04,  4.35it/s]
Loading dataset:  81%|████████  | 73/90 [00:17<00:04,  4.14it/s]
Loading dataset:  82%|████████▏ | 74/90 [00:18<00:03,  4.12it/s]
Loading dataset:  83%|████████▎ | 75/90 [00:18<00:03,  4.38it/s]
Loading dataset:  84%|████████▍ | 76/90 [00:18<00:03,  4.26it/s]
Loading dataset:  86%|████████▌ | 77/90 [00:18<00:03,  4.15it/s]
Loading dataset:  87%|████████▋ | 78/90 [00:18<00:03,  3.88it/s]
Loading dataset:  88%|████████▊ | 79/90 [00:19<00:02,  3.89it/s]
Loading dataset:  89%|████████▉ | 80/90 [00:19<00:02,  4.03it/s]
Loading dataset:  90%|█████████ | 81/90 [00:19<00:02,  4.27it/s]
Loading dataset:  91%|█████████ | 82/90 [00:19<00:01,  4.12it/s]
Loading dataset:  92%|█████████▏| 83/90 [00:20<00:01,  4.27it/s]
Loading dataset:  93%|█████████▎| 84/90 [00:20<00:01,  4.33it/s]
Loading dataset:  94%|█████████▍| 85/90 [00:20<00:01,  4.14it/s]
Loading dataset:  96%|█████████▌| 86/90 [00:20<00:00,  4.23it/s]
Loading dataset:  97%|█████████▋| 87/90 [00:21<00:00,  4.46it/s]
Loading dataset:  98%|█████████▊| 88/90 [00:21<00:00,  4.28it/s]
Loading dataset:  99%|█████████▉| 89/90 [00:21<00:00,  4.17it/s]
Loading dataset: 100%|██████████| 90/90 [00:21<00:00,  4.12it/s]
Loading dataset:   0%|          | 0/10 [00:00<?, ?it/s]
== Status ==
Current time: 2022-09-21 17:08:36 (running for 00:01:17.81)
Memory usage on this node: 19.3/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  10%|█         | 1/10 [00:00<00:02,  4.15it/s]
Loading dataset:  20%|██        | 2/10 [00:00<00:01,  4.34it/s]
Loading dataset:  30%|███       | 3/10 [00:00<00:01,  4.07it/s]
Loading dataset:  40%|████      | 4/10 [00:00<00:01,  4.44it/s]
Loading dataset:  50%|█████     | 5/10 [00:01<00:01,  4.28it/s]
Loading dataset:  60%|██████    | 6/10 [00:01<00:00,  4.59it/s]
Loading dataset:  70%|███████   | 7/10 [00:01<00:00,  4.46it/s]
Loading dataset:  80%|████████  | 8/10 [00:01<00:00,  4.62it/s]
Loading dataset:  90%|█████████ | 9/10 [00:02<00:00,  4.63it/s]
Loading dataset: 100%|██████████| 10/10 [00:02<00:00,  4.50it/s]
Loading dataset:   0%|          | 0/90 [00:00<?, ?it/s]
Loading dataset:   1%|          | 1/90 [00:00<00:27,  3.18it/s]
Loading dataset:   2%|▏         | 2/90 [00:00<00:24,  3.59it/s]
Loading dataset:   3%|▎         | 3/90 [00:00<00:21,  3.97it/s]
Loading dataset:   4%|▍         | 4/90 [00:01<00:20,  4.18it/s]
Loading dataset:   6%|▌         | 5/90 [00:01<00:19,  4.36it/s]
Loading dataset:   7%|▋         | 6/90 [00:01<00:19,  4.36it/s]
Loading dataset:   8%|▊         | 7/90 [00:01<00:19,  4.32it/s]
Loading dataset:   9%|▉         | 8/90 [00:01<00:17,  4.58it/s]
Loading dataset:  10%|█         | 9/90 [00:02<00:18,  4.45it/s]
Loading dataset:  11%|█         | 10/90 [00:02<00:17,  4.57it/s]
Loading dataset:  12%|█▏        | 11/90 [00:02<00:17,  4.41it/s]
Loading dataset:  13%|█▎        | 12/90 [00:02<00:16,  4.66it/s]
== Status ==
Current time: 2022-09-21 17:08:41 (running for 00:01:22.82)
Memory usage on this node: 20.0/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  14%|█▍        | 13/90 [00:02<00:16,  4.58it/s]
Loading dataset:  16%|█▌        | 14/90 [00:03<00:18,  4.15it/s]
Loading dataset:  17%|█▋        | 15/90 [00:03<00:17,  4.26it/s]
Loading dataset:  18%|█▊        | 16/90 [00:03<00:16,  4.47it/s]
Loading dataset:  19%|█▉        | 17/90 [00:03<00:16,  4.46it/s]
Loading dataset:  20%|██        | 18/90 [00:04<00:18,  3.98it/s]
Loading dataset:  21%|██        | 19/90 [00:04<00:18,  3.84it/s]
Loading dataset:  22%|██▏       | 20/90 [00:04<00:18,  3.76it/s]
Loading dataset:  23%|██▎       | 21/90 [00:05<00:18,  3.64it/s]
Loading dataset:  24%|██▍       | 22/90 [00:05<00:19,  3.47it/s]
Loading dataset:  26%|██▌       | 23/90 [00:05<00:19,  3.51it/s]
Loading dataset:  27%|██▋       | 24/90 [00:06<00:19,  3.33it/s]
Loading dataset:  28%|██▊       | 25/90 [00:06<00:19,  3.28it/s]
Loading dataset:  29%|██▉       | 26/90 [00:06<00:19,  3.26it/s]
Loading dataset:  30%|███       | 27/90 [00:06<00:19,  3.31it/s]
Loading dataset:  31%|███       | 28/90 [00:07<00:17,  3.58it/s]
Loading dataset:  32%|███▏      | 29/90 [00:07<00:15,  3.87it/s]
Loading dataset:  33%|███▎      | 30/90 [00:07<00:15,  3.95it/s]
Loading dataset:  34%|███▍      | 31/90 [00:07<00:13,  4.29it/s]
== Status ==
Current time: 2022-09-21 17:08:46 (running for 00:01:27.82)
Memory usage on this node: 20.9/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  36%|███▌      | 32/90 [00:08<00:13,  4.39it/s]
Loading dataset:  37%|███▋      | 33/90 [00:08<00:12,  4.44it/s]
Loading dataset:  38%|███▊      | 34/90 [00:08<00:12,  4.51it/s]
Loading dataset:  39%|███▉      | 35/90 [00:08<00:12,  4.53it/s]
Loading dataset:  40%|████      | 36/90 [00:08<00:13,  3.99it/s]
Loading dataset:  41%|████      | 37/90 [00:09<00:13,  4.06it/s]
Loading dataset:  42%|████▏     | 38/90 [00:09<00:11,  4.40it/s]
Loading dataset:  43%|████▎     | 39/90 [00:09<00:11,  4.33it/s]
Loading dataset:  44%|████▍     | 40/90 [00:09<00:11,  4.45it/s]
Loading dataset:  46%|████▌     | 41/90 [00:10<00:11,  4.36it/s]
Loading dataset:  47%|████▋     | 42/90 [00:10<00:10,  4.61it/s]
Loading dataset:  48%|████▊     | 43/90 [00:10<00:10,  4.62it/s]
Loading dataset:  49%|████▉     | 44/90 [00:10<00:10,  4.59it/s]
Loading dataset:  50%|█████     | 45/90 [00:10<00:10,  4.41it/s]
Loading dataset:  51%|█████     | 46/90 [00:11<00:09,  4.59it/s]
Loading dataset:  52%|█████▏    | 47/90 [00:11<00:09,  4.43it/s]
Loading dataset:  53%|█████▎    | 48/90 [00:11<00:09,  4.65it/s]
Loading dataset:  54%|█████▍    | 49/90 [00:11<00:09,  4.28it/s]
Loading dataset:  56%|█████▌    | 50/90 [00:12<00:09,  4.38it/s]
Loading dataset:  57%|█████▋    | 51/90 [00:12<00:08,  4.44it/s]
Loading dataset:  58%|█████▊    | 52/90 [00:12<00:08,  4.36it/s]
Loading dataset:  59%|█████▉    | 53/90 [00:12<00:08,  4.57it/s]
== Status ==
Current time: 2022-09-21 17:08:51 (running for 00:01:32.82)
Memory usage on this node: 21.9/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  60%|██████    | 54/90 [00:12<00:08,  4.44it/s]
Loading dataset:  61%|██████    | 55/90 [00:13<00:07,  4.66it/s]
Loading dataset:  62%|██████▏   | 56/90 [00:13<00:07,  4.49it/s]
Loading dataset:  63%|██████▎   | 57/90 [00:13<00:06,  4.73it/s]
Loading dataset:  64%|██████▍   | 58/90 [00:13<00:07,  4.51it/s]
Loading dataset:  66%|██████▌   | 59/90 [00:14<00:06,  4.50it/s]
Loading dataset:  67%|██████▋   | 60/90 [00:14<00:06,  4.67it/s]
Loading dataset:  68%|██████▊   | 61/90 [00:14<00:06,  4.66it/s]
Loading dataset:  69%|██████▉   | 62/90 [00:14<00:06,  4.32it/s]
Loading dataset:  70%|███████   | 63/90 [00:14<00:06,  4.29it/s]
Loading dataset:  71%|███████   | 64/90 [00:15<00:05,  4.41it/s]
Loading dataset:  72%|███████▏  | 65/90 [00:15<00:05,  4.49it/s]
Loading dataset:  73%|███████▎  | 66/90 [00:15<00:05,  4.49it/s]
Loading dataset:  74%|███████▍  | 67/90 [00:15<00:05,  4.52it/s]
Loading dataset:  76%|███████▌  | 68/90 [00:16<00:04,  4.41it/s]
Loading dataset:  77%|███████▋  | 69/90 [00:16<00:04,  4.44it/s]
Loading dataset:  78%|███████▊  | 70/90 [00:16<00:04,  4.53it/s]
Loading dataset:  79%|███████▉  | 71/90 [00:16<00:04,  4.24it/s]
Loading dataset:  80%|████████  | 72/90 [00:16<00:03,  4.53it/s]
Loading dataset:  81%|████████  | 73/90 [00:17<00:03,  4.40it/s]
Loading dataset:  82%|████████▏ | 74/90 [00:17<00:03,  4.53it/s]
Loading dataset:  83%|████████▎ | 75/90 [00:17<00:03,  4.37it/s]
== Status ==
Current time: 2022-09-21 17:08:56 (running for 00:01:37.82)
Memory usage on this node: 22.8/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

Loading dataset:  84%|████████▍ | 76/90 [00:17<00:03,  4.60it/s]
Loading dataset:  86%|████████▌ | 77/90 [00:18<00:02,  4.60it/s]
Loading dataset:  87%|████████▋ | 78/90 [00:18<00:02,  4.40it/s]
Loading dataset:  88%|████████▊ | 79/90 [00:18<00:02,  4.68it/s]
Loading dataset:  89%|████████▉ | 80/90 [00:18<00:02,  4.62it/s]
Loading dataset:  90%|█████████ | 81/90 [00:19<00:02,  4.34it/s]
Loading dataset:  91%|█████████ | 82/90 [00:19<00:01,  4.37it/s]
Loading dataset:  92%|█████████▏| 83/90 [00:19<00:01,  4.60it/s]
Loading dataset:  93%|█████████▎| 84/90 [00:19<00:01,  4.56it/s]
Loading dataset:  94%|█████████▍| 85/90 [00:19<00:01,  4.29it/s]
Loading dataset:  96%|█████████▌| 86/90 [00:20<00:00,  4.55it/s]
Loading dataset:  97%|█████████▋| 87/90 [00:20<00:00,  4.34it/s]
Loading dataset:  98%|█████████▊| 88/90 [00:20<00:00,  4.30it/s]
Loading dataset:  99%|█████████▉| 89/90 [00:20<00:00,  4.37it/s]
Loading dataset: 100%|██████████| 90/90 [00:21<00:00,  4.28it/s]
Sanity Checking: 0it [00:00, ?it/s]
(mainTrain pid=924) 
(mainTrain pid=924)   | Name            | Type            | Params
(mainTrain pid=924) ----------------------------------------------------
(mainTrain pid=924) 0 | net             | VNet            | 45.6 M
(mainTrain pid=924) 1 | modelRegression | UNetToRegresion | 14.4 K
(mainTrain pid=924) 2 | criterion       | FocalLoss       | 0     
(mainTrain pid=924) 3 | F1Score         | F1Score         | 0     
(mainTrain pid=924) ----------------------------------------------------
(mainTrain pid=924) 45.6 M    Trainable params
(mainTrain pid=924) 0         Non-trainable params
(mainTrain pid=924) 45.6 M    Total params
(mainTrain pid=924) 182.473   Total estimated model params size (MB)
Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]
== Status ==
Current time: 2022-09-21 17:09:01 (running for 00:01:42.83)
Memory usage on this node: 20.6/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:09:06 (running for 00:01:47.83)
Memory usage on this node: 21.0/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:09:11 (running for 00:01:52.83)
Memory usage on this node: 21.0/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:09:16 (running for 00:01:57.83)
Memory usage on this node: 21.2/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:09:21 (running for 00:02:02.84)
Memory usage on this node: 21.9/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:09:26 (running for 00:02:07.84)
Memory usage on this node: 22.1/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:09:31 (running for 00:02:12.84)
Memory usage on this node: 23.3/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:09:36 (running for 00:02:17.85)
Memory usage on this node: 23.5/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:09:41 (running for 00:02:22.85)
Memory usage on this node: 22.8/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:09:46 (running for 00:02:27.85)
Memory usage on this node: 22.7/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

(mainTrain pid=924)  total loss a 0.0 val_loss 9.226983070373535
(mainTrain pid=924)  total loss b tensor([0.0603])  total_loss,dice.aggregate() tensor([0.0603])
(mainTrain pid=924)  validation_acc 0.060256436467170715  
Sanity Checking DataLoader 0:  50%|█████     | 1/2 [00:49<00:49, 49.09s/it]
(mainTrain pid=924) Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 2. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
== Status ==
Current time: 2022-09-21 17:09:51 (running for 00:02:32.87)
Memory usage on this node: 21.2/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:09:56 (running for 00:02:37.87)
Memory usage on this node: 21.1/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:10:01 (running for 00:02:42.87)
Memory usage on this node: 21.2/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:10:06 (running for 00:02:47.88)
Memory usage on this node: 21.2/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:10:11 (running for 00:02:52.88)
Memory usage on this node: 21.7/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:10:16 (running for 00:02:57.88)
Memory usage on this node: 23.1/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:10:21 (running for 00:03:02.89)
Memory usage on this node: 23.3/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:10:26 (running for 00:03:07.89)
Memory usage on this node: 23.6/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:10:31 (running for 00:03:12.89)
Memory usage on this node: 22.5/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:10:36 (running for 00:03:17.89)
Memory usage on this node: 22.8/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

(mainTrain pid=924)  total loss a 0.0 val_loss 8.98957347869873
(mainTrain pid=924)  total loss b tensor([0.0575])  total_loss,dice.aggregate() tensor([0.0575])
(mainTrain pid=924)  validation_acc 0.057469725608825684  
Sanity Checking DataLoader 0: 100%|██████████| 2/2 [01:37<00:00, 48.88s/it]avg_val_loss 9.108278274536133

Epoch 0:   0%|          | 0/15 [00:00<?, ?it/s] 
== Status ==
Current time: 2022-09-21 17:10:41 (running for 00:03:22.90)
Memory usage on this node: 23.7/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:10:46 (running for 00:03:27.90)
Memory usage on this node: 27.5/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:10:51 (running for 00:03:32.90)
Memory usage on this node: 29.0/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:10:56 (running for 00:03:37.91)
Memory usage on this node: 30.9/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+

== Status ==
Current time: 2022-09-21 17:11:01 (running for 00:03:42.91)
Memory usage on this node: 34.2/167.0 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 3.0/24 CPUs, 2.0/2 GPUs, 0.0/107.2 GiB heap, 0.0/49.93 GiB objects (0.0/1.0 accelerator_type:A100)
Result logdir: /home/sliceruser/ray_results/picai-hyperparam-search-30
Number of trials: 1/1 (1 RUNNING)
+-----------------------+----------+----------------+
| Trial name            | status   | loc            |
|-----------------------+----------+----------------|
| mainTrain_d9a41_00000 | RUNNING  | 10.164.0.3:924 |
+-----------------------+----------+----------------+
amogkam commented 2 years ago

what pytorch lightning version do you have @jakubMitura14?

amogkam commented 2 years ago

If this is with Pytorch lightning 1.7, then looks this is the same as this issue: https://github.com/ray-project/ray/issues/28197.

This has been fixed in the nightly versions of Ray. Alternatively, you can do this workaround to resolve the issue for Ray 2.0 or prior:

import ray
ray.init(runtime_env={"env_vars": {"PL_DISABLE_FORK": "1"}})

Add this to the beginning of your training script

jakubMitura14 commented 2 years ago

Thanks!!! it resolved the issue