os-climate / aicoe-osc-demo

This repository is the central location for the demos the ET data science team is developing within the OS-Climate project. This demo shows how to use the tools provided by Open Data Hub (ODH) running on the Operate First cluster to perform ETL, create training and inference pipelines.
Apache License 2.0
11 stars 25 forks source link

Unable to start server #211

Closed JeremyGohBNP closed 2 years ago

JeremyGohBNP commented 2 years ago

Spawn failed. Time out when I tried to start the server. It only worked in the default configuration: 0 GPU, Small memory.

I tried with multiple configurations: did not work with 1 GPU, so I tried with CPU only but it failed too. Would it be because of Large Memory size I required?

image

Shreyanand commented 2 years ago

Hi @JeremyGohBNP, does the bug still persist?

redmikhail commented 2 years ago

@JeremyGohBNP , based on log entries your notebook is being scheduled to the regular node without GPU. I think that you need to select one of CUDA images but I hope @Shreyanand can suggest what needs to be done

JeremyGohBNP commented 2 years ago

The bug persists. I switched to CL1 for a demo last week.

JeremyGohBNP commented 2 years ago

image image This is the configuration I set and the message error I have everytime.

Shreyanand commented 2 years ago

Hi @JeremyGohBNP , @HumairAK confirmed that he was able to spawn both gpu and no gpu instance. Could you please try and see if it works for you? image (1)

JeremyGohBNP commented 2 years ago

@Shreyanand @HumairAK Thanks very much both, I tried again and it's now working and back to normal!