Open vijithvarma opened 3 years ago
Can you give more detail on how you installed, which commit you are using, and what you ran? Can you try running:
python -m unittest
Hi,
I was using the Ibex cluster to run the code.
You can see the following to how I installed and run the code:
https://github.com/ibex-training/deepinpy
Please let me know if you require any further information.
On Fri, Nov 13, 2020 at 11:02 PM Jon Tamir notifications@github.com wrote:
Can you give more detail on how you installed, which commit you are using, and what you ran? Can you try running: python -m unittest
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/utcsilab/deepinpy/issues/19#issuecomment-727003725, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARXQBFMGQLHJN5XONRIW323SPWGE7ANCNFSM4TU7GPHQ .
-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547
--
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
Let's try to get this working!
Could you try rebasing your changes against the master branch, re-installing requirements.txt, running python -m unittest
, and then running your test on the cluster?
Thanks!
Hi, I tried with reinstalling requirements.txt and python -m unittest . But still it's giving same error. Can you explain me what is the error about?
Well you aren't actually getting an error, you are getting warnings that I have not seen before, and your loss function is blowing up. What example are you running? Can you provide the full output of the terminal?
Yes, I am running example.json (log-Deepinpy-slurm-12892792.err and log-Deepinpy-slurm-12892792.out) and example_hyperopt.json (log-Deepinpy-slurm-12892791.err and log-Deepinpy-slurm-12892791.out) examples. PFA for error and ouyput files. files.zip
Thank you, I understand most of what is happening now.
The reason you get nan values is because the default config is not set up for SGD. Can you add this line to the config file?
"solver": "adam"
I am not sure why your checkpoints report that error. But if you update your code with master, it should not happen. Updating to master will also fix the Adam vs. SGD solver issue.
The third error with hyperopt is because you do not have GPUs. The Hyperopt default config is designed for a machine with four GPUs. If you want to run hyperopt across CPUs, you can remove the gpu line from the config.
Thank you very much, I will look into it.
Hi,
I'm trying to implement your code for my datasets (Which is not MRI data). I am finding some difficulties in doing so. Is it possible for you to provide a generic version of the code and some helpful document for the code?
Thanks and Regards
On Wed, Nov 18, 2020 at 12:58 AM Jon Tamir notifications@github.com wrote:
Thank you, I understand most of what is happening now.
- The reason you get nan values is because the default config is not set up for SGD. Can you add this line to the config file?
"solver": "adam"
1.
I am not sure why your checkpoints report that error. But if you update your code with master, it should not happen. Updating to master will also fix the Adam vs. SGD solver issue. 2.
The third error with hyperopt is because you do not have GPUs. The Hyperopt default config is designed for a machine with four GPUs. If you want to run hyperopt across CPUs, you can remove the gpu line from the config.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/utcsilab/deepinpy/issues/19#issuecomment-729238798, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARXQBFLDL4MCGT5ADWG6XZDSQLWYPANCNFSM4TU7GPHQ .
-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547
--
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
Hi,
I'm trying to implement your code for my datasets (Which is not MRI data). I am finding some difficulties in doing so. Is it possible for you to provide a generic version of the code and some helpful document for the code?
Thanks and Regards
On Fri, Dec 4, 2020 at 8:50 PM Vijith Kotte vijith.kotte@kaust.edu.sa wrote:
Hi,
I'm trying to implement your code for my datasets (Which is not MRI data). I am finding some difficulties in doing so. Is it possible for you to provide a generic version of the code and some helpful document for the code?
Thanks and Regards
On Wed, Nov 18, 2020 at 12:58 AM Jon Tamir notifications@github.com wrote:
Thank you, I understand most of what is happening now.
The reason you get nan values is because the default config is not set up for SGD. Can you add this line to the config file?
"solver": "adam"
I am not sure why your checkpoints report that error. But if you update your code with master, it should not happen. Updating to master will also fix the Adam vs. SGD solver issue.
The third error with hyperopt is because you do not have GPUs. The Hyperopt default config is designed for a machine with four GPUs. If you want to run hyperopt across CPUs, you can remove the gpu line from the config.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547
-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547
--
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
Hi, I would be happy to help. Can you please email me to discuss what you are trying to do?
From: vijithvarma notifications@github.com Sent: Tuesday, December 15, 2020 05:34 To: utcsilab/deepinpy deepinpy@noreply.github.com Cc: Jon Tamir jtamir@utexas.edu; Comment comment@noreply.github.com Subject: Re: [utcsilab/deepinpy] Error at torch.Tensor (#19)
Hi,
I'm trying to implement your code for my datasets (Which is not MRI data). I am finding some difficulties in doing so. Is it possible for you to provide a generic version of the code and some helpful document for the code?
Thanks and Regards
On Fri, Dec 4, 2020 at 8:50 PM Vijith Kotte vijith.kotte@kaust.edu.sa wrote:
Hi,
I'm trying to implement your code for my datasets (Which is not MRI data). I am finding some difficulties in doing so. Is it possible for you to provide a generic version of the code and some helpful document for the code?
Thanks and Regards
On Wed, Nov 18, 2020 at 12:58 AM Jon Tamir notifications@github.com wrote:
Thank you, I understand most of what is happening now.
The reason you get nan values is because the default config is not set up for SGD. Can you add this line to the config file?
"solver": "adam"
I am not sure why your checkpoints report that error. But if you update your code with master, it should not happen. Updating to master will also fix the Adam vs. SGD solver issue.
The third error with hyperopt is because you do not have GPUs. The Hyperopt default config is designed for a machine with four GPUs. If you want to run hyperopt across CPUs, you can remove the gpu line from the config.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547
-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547
--
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/utcsilab/deepinpy/issues/19#issuecomment-745232261, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCQC224G35NVBXIXZBYOUDSU5CVJANCNFSM4TU7GPHQ.
This message is from an external sender. Learn more about why this matters.https://ut.service-now.com/sp?id=kb_article&number=KB0011401
Hi,
Can I get your email address?
Thanks and regards
On Tue, Dec 15, 2020 at 9:59 PM Jon Tamir notifications@github.com wrote:
Hi, I would be happy to help. Can you please email me to discuss what you are trying to do?
From: vijithvarma notifications@github.com Sent: Tuesday, December 15, 2020 05:34 To: utcsilab/deepinpy deepinpy@noreply.github.com Cc: Jon Tamir jtamir@utexas.edu; Comment comment@noreply.github.com Subject: Re: [utcsilab/deepinpy] Error at torch.Tensor (#19)
Hi,
I'm trying to implement your code for my datasets (Which is not MRI data). I am finding some difficulties in doing so. Is it possible for you to provide a generic version of the code and some helpful document for the code?
Thanks and Regards
On Fri, Dec 4, 2020 at 8:50 PM Vijith Kotte vijith.kotte@kaust.edu.sa wrote:
Hi,
I'm trying to implement your code for my datasets (Which is not MRI data). I am finding some difficulties in doing so. Is it possible for you to provide a generic version of the code and some helpful document for the code?
Thanks and Regards
On Wed, Nov 18, 2020 at 12:58 AM Jon Tamir notifications@github.com wrote:
Thank you, I understand most of what is happening now.
The reason you get nan values is because the default config is not set up for SGD. Can you add this line to the config file?
"solver": "adam"
I am not sure why your checkpoints report that error. But if you update your code with master, it should not happen. Updating to master will also fix the Adam vs. SGD solver issue.
The third error with hyperopt is because you do not have GPUs. The Hyperopt default config is designed for a machine with four GPUs. If you want to run hyperopt across CPUs, you can remove the gpu line from the config.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547
-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547
--
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/utcsilab/deepinpy/issues/19#issuecomment-745232261, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCQC224G35NVBXIXZBYOUDSU5CVJANCNFSM4TU7GPHQ.
This message is from an external sender. Learn more about why this matters.https://ut.service-now.com/sp?id=kb_article&number=KB0011401
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
-- K Vijith Varma PhD Student in Electrical Engineering (EE) King Abdullah University of Science and Technology (KAUST) Bldg 1, Lv 3, 3139-WS29 KAUST, Thuwal 23955-6900, Jeddah, Saudi Arabia Mobile: +966 (0)564624547
--
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
Can you checkout the following error: When I tried to run the code I got following error. Can you help me with this
**Error file: /home/kottevv/deepin1/deepinpy/env/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:37: RuntimeWarning: The metric you returned 17 must be a
torch.Tensor
instance, checkpoint not saved HINT: what is the value of epoch in validation_epoch_end()? warnings.warn(*args, *kwargs) /home/kottevv/deepin1/deepinpy/env/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:37: RuntimeWarning: The metric you returned 18 must be atorch.Tensor
instance, checkpoint not saved HINT: what is the value of epoch in validation_epoch_end()? warnings.warn(args, kwargs) /home/kottevv/deepin1/deepinpy/env/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:37: RuntimeWarning: The metric you returned 19 must be atorch.Tensor
instance, checkpoint not saved HINT: what is the value of epoch in validation_epoch_end()? warnings.warn(*args, **kwargs) Saving latest checkpoint..Log file: Epoch 0: 50%|█████ | 1/2 [00:01<00:01, 1.35s/it, loss=30.286, v_num=5, lamEpoch 0: 100%|██████████| 2/2 [00:01<00:00, 1.08it/s, loss=30.286, v_num=5, lamEpoch 0: 100%|██████████| 2/2 [00:01<00:00, 1.08it/s, loss=135961.016, v_num=5,Epoch 0: 0%| | 0/2 [00:00<?, ?it/s, loss=135961.016, v_num=5, lambda=Epoch 1: 0%| | 0/2 [00:00<?, ?it/s, loss=135961.016, v_num=5, lambda=Epoch 1: 50%|█████ | 1/2 [00:00<00:00, 2.59it/s, loss=135961.016, v_num=5,Epoch 1: 50%|█████ | 1/2 [00:00<00:00, 2.59it/s, loss=inf, v_num=5, lambdaEpoch 1: 100%|██████████| 2/2 [00:00<00:00, 3.75it/s, loss=inf, v_num=5, lambdaEpoch 1: 100%|██████████| 2/2 [00:00<00:00, 3.75it/s, loss=inf, v_num=5, lambdaEpoch 1: 0%| | 0/2 [00:00<?, ?it/s, loss=inf, v_num=5, lambda=0, traiEpoch 2: 0%| | 0/2 [00:00<?, ?it/s, loss=inf, v_num=5, lambda=0, traiEpoch 2: 50%|█████ | 1/2 [00:00<00:00, 2.80it/s, loss=inf, v_num=5, lambdaEpoch 2: 50%|█████ | 1/2 [00:00<00:00, 2.79it/s, loss=inf, v_num=5, lambdaEpoch 2: 100%|██████████| 2/2 [00:00<00:00, 4.13it/s, loss=inf, v_num=5, lambdaEpoch 2: 100%|██████████| 2/2 [00:00<00:00, 4.12it/s, loss=inf, v_num=5, lambdaEpoch 2: 0%| | 0/2 [00:00<?, ?it/s, loss=inf, v_num=5, lambda=0, traiEpoch 3: 0%| | 0/2 [00:00<?, ?it/s, loss=inf, v_num=5, lambda=0, traiEpoch 3: 50%|█████ | 1/2 [00:00<00:00, 2.83it/s, loss=inf, v_num=5, lambda