neuroinformatics-unit / HowTo

NIU website on common software problems and their troubleshooting
http://howto.neuroinformatics.dev/
Creative Commons Attribution 4.0 International
9 stars 0 forks source link

Updates to ssh howto guide #61

Open sfmig opened 1 week ago

sfmig commented 1 week ago

Is your feature request related to a problem? Please describe. @niksirbi advised on some improvements to the ssh guide, quotes below:

  1. I think we should advice people to skip the bastion node if they are within the SWC network already (e.g.. they have managed machines).
  2. We should also urge people to always start an interactive job, even for "small" tasks, like creating conda environments.
  3. Lastly, maybe we should get rid of the section on VSCode remote access? I think that may cause more harm than good.

Also potentially update the guide to take into account Richard's message:

We have found that the Remote SSH function of Visual Studio is causing these problems by creating huge server-side loads. This has been affecting many HPC sites across the world particularly in the last few months.

If you are using Visual Studio, please be sure to disable all extensions which are not required on the server side. In particular, you must disable TypeScript and Javascript Language Services in Visual Studio Code: • Extensions button in VS Code (left toolbar) • Search for ‘@builtin TypeScript’. • Disable the TypeScript and Javascript Language Features extension • Reload Further documentation :- https://code.visualstudio.com/docs/remote/ssh

Describe the solution you'd like \

Describe alternatives you've considered \

Additional context \

lauraporta commented 1 week ago

Thanks both Niko and Sofia for rising this issue. Commenting on the improvements proposed:

  1. Agree
  2. Makes total sense. Maybe we could have a simple template for a environment creation job.
  3. Seems reasonable, although reinforcing the requirement of disabling the extensions (as Richard suggested) could be enough. How much are we (and other people in swc) ssh-ing with vscode for development necessities? It might be that this functionality solves common troubleshooting problems. I am using OOD virtual desktop on a dedicated node whenever I have similar needs.
niksirbi commented 1 day ago

2. Makes total sense. Maybe we could have a simple template for a environment creation job.

good idea, we could include the example commands for that.

3. Seems reasonable, although reinforcing the requirement of disabling the extensions (as Richard suggested) could be enough. How much are we (and other people in swc) ssh-ing with vscode for development necessities? It might be that this functionality solves common troubleshooting problems. I am using OOD virtual desktop on a dedicated node whenever I have similar needs.

I think the problem is that our current instructions lead to running jobs on bastion/gateway nodes, as noted by Pierre here: #62. Even if people have the correct VSCode settings, it's not a practice we should encourage. I think OOD is probably a much better solution for remote development.

sfmig commented 1 day ago

Should we discard VSCode altogether?

My common way of working lately is to ssh to the cluster (hpc-gw1) via VScode and submit batch jobs or request interactive nodes from there. What is wrong with this approach?

Compared to a regular terminal, I find it very convenient for submitting jobs (interactive or batch) to have a terminal and a view of the file tree at the same time. Also many people are familiar with VSCode already, which lowers the entry barriers to the cluster.

Isn't it a bit excessive to actively discourage VScode because of the risk of someone sshing to a compute node? Shouldn't that be prevented in some other way? Or am I missing something?

niksirbi commented 1 day ago

I agree with you that developing exactly in the way you describe is very useful, and I've used the same workflow several times. I'd like to preserve that way of working for me and others, but we have to make sure that our guide reflects best practice and we don't inadvertently over-burden hpc-gw1.