radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Resource configuration file for the Laplace HPC machine #3177

Closed victor-malagon closed 3 months ago

victor-malagon commented 4 months ago

Hello,

I am working on FACTS together with @AlexReedy and I need a resource configuration file for the HPC that I am using (laplace). We have a project deadline coming up soon where we'll need FACTS up and running on our HPC, so we would be really happy if you could consider our request as soon as possible.

Best regards,

Victor

andre-merzky commented 4 months ago

Thanks for opening this ticket. Can you please provide a link to the system documentation for Laplace?

AlexReedy commented 4 months ago

Hey @andre-merzky thanks for making this a priority for us! We have an end of June deadline for this project. As far as the process goes I can aid @victor-malagon on the installation and set up of FACTS with the new config once we get everything in order as well.

shantenujha commented 4 months ago

Hi Alex, Do you have a link or details about Laplace?

victor-malagon commented 4 months ago

Hi @andre-merzky and @shantenujha,

Thanks for taking care of this. This is the only info I have right now, I can contact the HPC manager to get more or a related link if necessary.

The Laplace Cluster is a High Performance Computing cluster consisting of 3716 Xeon Processor Cores. The Operating system is Linux (Red Hat Enterprise 7.9 ). The Laplace cluster is organized as follows:

Usually I run experiments on the 75-98 nodes.

Victor

andre-merzky commented 4 months ago

Thanks for the information, @victor-malagon . If there is no more information, please allow me to ask a couple of questions:

That the cluster is heterogeneous is difficult to handle for RP. I suggest that we create individual resource descriptions for each node type. How are individual node types requested - by using specific batch queues?

Thanks, Andre.

victor-malagon commented 4 months ago

Hi Andre. Let me check this with the Network Administrator at my workplace, I have basic knowledge on HPC systems and I am not sure how to answer some of these questions. I'll get back to you as soon as possible.

Thanks, Victor

AlexReedy commented 4 months ago

Hi @shantenujha unfortunately I do not, but I think @victor-malagon's conversation with the network admin can provide that Best, Alex

victor-malagon commented 4 months ago

Hi @andre-merzky,

This is the info I got from the Network Administrator:

_We have several MPI implementations. You choose a module with the ‘load command‘. ‘module list’ will give you a list of available modules.

The scheduler we use is slurm 18.08.8 We have several python versions selectable via anaconda We use openmpi3, but others are available Home directories are available on the compute nodes (same path) We have several scratch/data directories available

We have several partitions depending on hardware specs like number of cores and memory size. All nodes in a partition have the same hardware specs._

Let me know if there are further questions.

Victor

andre-merzky commented 4 months ago

Sorry for the late response, I was offline last week.

Thanks for the information - alas the response is a bit too general. Is there a way to get contact to the admins directly, or find some cluster user guide online? If not, can you please try to inquire about the following details:

Thanks, Andre.

victor-malagon commented 4 months ago

Hi Andre,

you can contact the network administrator here: jan.derksen@nioz.nl

Let me know if they take a while or you need more info from me.

Cheers,

Victor

andre-merzky commented 4 months ago

Thanks @victor-malagon , will do

andre-merzky commented 3 months ago

@victor-malagon : please have a look at this pull request: https://github.com/radical-cybertools/radical.pilot/pull/3194. Would you mind giving it a try? The only missing information is about scratch space - can you find out if $SCRATCH is set? Or is there another way to determine the location of the user's scratch space?

andre-merzky commented 3 months ago

@victor-malagon : Please let us know if we can help with testing.

andre-merzky commented 3 months ago

Closing until user feedback.

victor-malagon commented 2 months ago

Hi @andre-merzky, sorry for the silence, we were busy with deadlines here. I'll give it a try with @AlexReedy once we're all back in office and let you know how it goes.

andre-merzky commented 2 months ago

Not a problem, we all know how that works :-) Please re-open the issue once you got a chance to look at it.