Open amsnyder opened 9 months ago
Working with Brendan Wakefield wot build images with Image Factory that will be used to deploy AWS Parallel Cluster - will have 4 images (parallel cluster, parallel cluster + WRF, parallel cluster +WRF-Hydro, parallel cluster + NHM-PRMS)
AWS Parallel Cluster- CloudFormation deployment is working, will start working on deploying in the Service Catalog. Also pending Image creation.
AWS Parallel Cluster- Packer+GitLab, Image creation (meeting this week, Friday Dec 1). Image creation (Brendan), to start working this Friday(pending to schedule this call and troubleshooting from Nebari taking a priority)
Presentation of AWS Parallel Cluster for HPC team- Moved from last week to this week (Topics AWS Parallel Cluster, Fine Tuning and Pre-training LLM’s) this is a casual presentation of what we are doing at HyTEST with AWS Parallel Cluster and follow up on "what AI can do at USGS for us."
Now deployment consistent Storage- to provide EFS as a starting point and provide instructions in how to use (Lustre, local, etc) Review new HPC recipes from AWS.:
Added EFS as storage solution, keeping also optional for scientists to change the storage piece of the deployment.
Not making progress on Nebari- Will start working the development and build completions of AWS parallel cluster in CHS service catalog 75% completed.
Can use and manual (using CloudFormation) deployments/ properly tagged can be done to enable and support scientists in the meantime.
Started testing the deployment of pcluster with members of HPC-ARC(landsart workload with Lopaka Lee
HPC-ARC, CHS and the HTC Consulting group are aware now of the results of using pcluster with Todd Hawbackers story. The story has moved from we would like to do things in the cloud to know how we can do this at scale at USGS and serve more people with this product. I worked in a strategy/ path to achieve this. Provided the presentations to Janice and Al Pedraza for them to discuss.
Discussed with Lee Lopaka our pcluster configuration, he advised a new queue structure (CPU and GPU) that replicates our current queue on-premises. This will create similarity to how the queues are being setup at USGS in our on-premises environments.
Deploye basic AWS parallel cluster instance (no software pre-installed) in WMA account of AWS with 6 types of compute instances available: