utcsilab / deepinpy

Deep inverse problems in Python
MIT License
55 stars 24 forks source link

Error at torch.Tensor #19

Open vijithvarma opened 3 years ago

vijithvarma commented 3 years ago

Can you checkout the following error: When I tried to run the code I got following error. Can you help me with this

**Error file: /home/kottevv/deepin1/deepinpy/env/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:37: RuntimeWarning: The metric you returned 17 must be a torch.Tensor instance, checkpoint not saved HINT: what is the value of epoch in validation_epoch_end()? warnings.warn(*args, *kwargs) /home/kottevv/deepin1/deepinpy/env/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:37: RuntimeWarning: The metric you returned 18 must be a torch.Tensor instance, checkpoint not saved HINT: what is the value of epoch in validation_epoch_end()? warnings.warn(args, kwargs) /home/kottevv/deepin1/deepinpy/env/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:37: RuntimeWarning: The metric you returned 19 must be a torch.Tensor instance, checkpoint not saved HINT: what is the value of epoch in validation_epoch_end()? warnings.warn(*args, **kwargs) Saving latest checkpoint..

Log file: Epoch 0: 50%|█████ | 1/2 [00:01<00:01, 1.35s/it, loss=30.286, v_num=5, lamEpoch 0: 100%|██████████| 2/2 [00:01<00:00, 1.08it/s, loss=30.286, v_num=5, lamEpoch 0: 100%|██████████| 2/2 [00:01<00:00, 1.08it/s, loss=135961.016, v_num=5,Epoch 0: 0%| | 0/2 [00:00<?, ?it/s, loss=135961.016, v_num=5, lambda=Epoch 1: 0%| | 0/2 [00:00<?, ?it/s, loss=135961.016, v_num=5, lambda=Epoch 1: 50%|█████ | 1/2 [00:00<00:00, 2.59it/s, loss=135961.016, v_num=5,Epoch 1: 50%|█████ | 1/2 [00:00<00:00, 2.59it/s, loss=inf, v_num=5, lambdaEpoch 1: 100%|██████████| 2/2 [00:00<00:00, 3.75it/s, loss=inf, v_num=5, lambdaEpoch 1: 100%|██████████| 2/2 [00:00<00:00, 3.75it/s, loss=inf, v_num=5, lambdaEpoch 1: 0%| | 0/2 [00:00<?, ?it/s, loss=inf, v_num=5, lambda=0, traiEpoch 2: 0%| | 0/2 [00:00<?, ?it/s, loss=inf, v_num=5, lambda=0, traiEpoch 2: 50%|█████ | 1/2 [00:00<00:00, 2.80it/s, loss=inf, v_num=5, lambdaEpoch 2: 50%|█████ | 1/2 [00:00<00:00, 2.79it/s, loss=inf, v_num=5, lambdaEpoch 2: 100%|██████████| 2/2 [00:00<00:00, 4.13it/s, loss=inf, v_num=5, lambdaEpoch 2: 100%|██████████| 2/2 [00:00<00:00, 4.12it/s, loss=inf, v_num=5, lambdaEpoch 2: 0%| | 0/2 [00:00<?, ?it/s, loss=inf, v_num=5, lambda=0, traiEpoch 3: 0%| | 0/2 [00:00<?, ?it/s, loss=inf, v_num=5, lambda=0, traiEpoch 3: 50%|█████ | 1/2 [00:00<00:00, 2.83it/s, loss=inf, v_num=5, lambda

jtamir commented 3 years ago

Can you give more detail on how you installed, which commit you are using, and what you ran? Can you try running: python -m unittest

vijithvarma commented 3 years ago

Hi,

I was using the Ibex cluster to run the code.

You can see the following to how I installed and run the code:

https://github.com/ibex-training/deepinpy

Please let me know if you require any further information.

On Fri, Nov 13, 2020 at 11:02 PM Jon Tamir notifications@github.com wrote:

Can you give more detail on how you installed, which commit you are using, and what you ran? Can you try running: python -m unittest

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/utcsilab/deepinpy/issues/19#issuecomment-727003725, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARXQBFMGQLHJN5XONRIW323SPWGE7ANCNFSM4TU7GPHQ .

-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547

--

This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.

jtamir commented 3 years ago

Let's try to get this working! Could you try rebasing your changes against the master branch, re-installing requirements.txt, running python -m unittest, and then running your test on the cluster? Thanks!

vijithvarma commented 3 years ago

Hi, I tried with reinstalling requirements.txt and python -m unittest . But still it's giving same error. Can you explain me what is the error about?

jtamir commented 3 years ago

Well you aren't actually getting an error, you are getting warnings that I have not seen before, and your loss function is blowing up. What example are you running? Can you provide the full output of the terminal?

vijithvarma commented 3 years ago

Yes, I am running example.json (log-Deepinpy-slurm-12892792.err and log-Deepinpy-slurm-12892792.out) and example_hyperopt.json (log-Deepinpy-slurm-12892791.err and log-Deepinpy-slurm-12892791.out) examples. PFA for error and ouyput files. files.zip

jtamir commented 3 years ago

Thank you, I understand most of what is happening now.

  1. The reason you get nan values is because the default config is not set up for SGD. Can you add this line to the config file?

    "solver": "adam"
  2. I am not sure why your checkpoints report that error. But if you update your code with master, it should not happen. Updating to master will also fix the Adam vs. SGD solver issue.

  3. The third error with hyperopt is because you do not have GPUs. The Hyperopt default config is designed for a machine with four GPUs. If you want to run hyperopt across CPUs, you can remove the gpu line from the config.

vijithvarma commented 3 years ago

Thank you very much, I will look into it.

vijithvarma commented 3 years ago

Hi,

I'm trying to implement your code for my datasets (Which is not MRI data). I am finding some difficulties in doing so. Is it possible for you to provide a generic version of the code and some helpful document for the code?

Thanks and Regards

On Wed, Nov 18, 2020 at 12:58 AM Jon Tamir notifications@github.com wrote:

Thank you, I understand most of what is happening now.

  1. The reason you get nan values is because the default config is not set up for SGD. Can you add this line to the config file?

"solver": "adam"

1.

I am not sure why your checkpoints report that error. But if you update your code with master, it should not happen. Updating to master will also fix the Adam vs. SGD solver issue. 2.

The third error with hyperopt is because you do not have GPUs. The Hyperopt default config is designed for a machine with four GPUs. If you want to run hyperopt across CPUs, you can remove the gpu line from the config.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/utcsilab/deepinpy/issues/19#issuecomment-729238798, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARXQBFLDL4MCGT5ADWG6XZDSQLWYPANCNFSM4TU7GPHQ .

-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547

--

This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.

vijithvarma commented 3 years ago

Hi,

I'm trying to implement your code for my datasets (Which is not MRI data). I am finding some difficulties in doing so. Is it possible for you to provide a generic version of the code and some helpful document for the code?

Thanks and Regards

On Fri, Dec 4, 2020 at 8:50 PM Vijith Kotte vijith.kotte@kaust.edu.sa wrote:

Hi,

I'm trying to implement your code for my datasets (Which is not MRI data). I am finding some difficulties in doing so. Is it possible for you to provide a generic version of the code and some helpful document for the code?

Thanks and Regards

On Wed, Nov 18, 2020 at 12:58 AM Jon Tamir notifications@github.com wrote:

Thank you, I understand most of what is happening now.

The reason you get nan values is because the default config is not set up for SGD. Can you add this line to the config file?

"solver": "adam"

I am not sure why your checkpoints report that error. But if you update your code with master, it should not happen. Updating to master will also fix the Adam vs. SGD solver issue.

The third error with hyperopt is because you do not have GPUs. The Hyperopt default config is designed for a machine with four GPUs. If you want to run hyperopt across CPUs, you can remove the gpu line from the config.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547

-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547

--

This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.

jtamir commented 3 years ago

Hi, I would be happy to help. Can you please email me to discuss what you are trying to do?


From: vijithvarma notifications@github.com Sent: Tuesday, December 15, 2020 05:34 To: utcsilab/deepinpy deepinpy@noreply.github.com Cc: Jon Tamir jtamir@utexas.edu; Comment comment@noreply.github.com Subject: Re: [utcsilab/deepinpy] Error at torch.Tensor (#19)

Hi,

I'm trying to implement your code for my datasets (Which is not MRI data). I am finding some difficulties in doing so. Is it possible for you to provide a generic version of the code and some helpful document for the code?

Thanks and Regards

On Fri, Dec 4, 2020 at 8:50 PM Vijith Kotte vijith.kotte@kaust.edu.sa wrote:

Hi,

I'm trying to implement your code for my datasets (Which is not MRI data). I am finding some difficulties in doing so. Is it possible for you to provide a generic version of the code and some helpful document for the code?

Thanks and Regards

On Wed, Nov 18, 2020 at 12:58 AM Jon Tamir notifications@github.com wrote:

Thank you, I understand most of what is happening now.

The reason you get nan values is because the default config is not set up for SGD. Can you add this line to the config file?

"solver": "adam"

I am not sure why your checkpoints report that error. But if you update your code with master, it should not happen. Updating to master will also fix the Adam vs. SGD solver issue.

The third error with hyperopt is because you do not have GPUs. The Hyperopt default config is designed for a machine with four GPUs. If you want to run hyperopt across CPUs, you can remove the gpu line from the config.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547

-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547

--

This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/utcsilab/deepinpy/issues/19#issuecomment-745232261, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCQC224G35NVBXIXZBYOUDSU5CVJANCNFSM4TU7GPHQ.

This message is from an external sender. Learn more about why this matters.https://ut.service-now.com/sp?id=kb_article&number=KB0011401

vijithvarma commented 3 years ago

Hi,

Can I get your email address?

Thanks and regards

On Tue, Dec 15, 2020 at 9:59 PM Jon Tamir notifications@github.com wrote:

Hi, I would be happy to help. Can you please email me to discuss what you are trying to do?


From: vijithvarma notifications@github.com Sent: Tuesday, December 15, 2020 05:34 To: utcsilab/deepinpy deepinpy@noreply.github.com Cc: Jon Tamir jtamir@utexas.edu; Comment comment@noreply.github.com Subject: Re: [utcsilab/deepinpy] Error at torch.Tensor (#19)

Hi,

I'm trying to implement your code for my datasets (Which is not MRI data). I am finding some difficulties in doing so. Is it possible for you to provide a generic version of the code and some helpful document for the code?

Thanks and Regards

On Fri, Dec 4, 2020 at 8:50 PM Vijith Kotte vijith.kotte@kaust.edu.sa wrote:

Hi,

I'm trying to implement your code for my datasets (Which is not MRI data). I am finding some difficulties in doing so. Is it possible for you to provide a generic version of the code and some helpful document for the code?

Thanks and Regards

On Wed, Nov 18, 2020 at 12:58 AM Jon Tamir notifications@github.com wrote:

Thank you, I understand most of what is happening now.

The reason you get nan values is because the default config is not set up for SGD. Can you add this line to the config file?

"solver": "adam"

I am not sure why your checkpoints report that error. But if you update your code with master, it should not happen. Updating to master will also fix the Adam vs. SGD solver issue.

The third error with hyperopt is because you do not have GPUs. The Hyperopt default config is designed for a machine with four GPUs. If you want to run hyperopt across CPUs, you can remove the gpu line from the config.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547

-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547

--

This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/utcsilab/deepinpy/issues/19#issuecomment-745232261, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCQC224G35NVBXIXZBYOUDSU5CVJANCNFSM4TU7GPHQ.

This message is from an external sender. Learn more about why this matters.https://ut.service-now.com/sp?id=kb_article&number=KB0011401

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547

--

This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.