xcat2 / xcat.org

Repository for managing the xcat.org website
2 stars 8 forks source link

Write 3 typical user scenarios or user stories #109

Closed bybai closed 5 years ago

bybai commented 5 years ago

Write 3 typical user scenarios or user stories, format is text.

robin2008 commented 5 years ago

Planned 3 case to cover:

robin2008 commented 5 years ago

Use case 1 - HPC

xCAT is used to manage the servers in Summit and Sierra, two supercomputers taking the #1 and #2 spots on the Nov 2018 Top 500.

Extreme scalable with hierarchy architecture

To manage and provision thousands of bare-metal servers in supercomputing data center, a scalable architecture is mandatory. xCAT supports hierarchy architecture with multiple Service nodes, with that Compute nodes are partitioned and managed by the corresponding Service Nodes.

[MG: In sentence above, remove "with that" and instead of "the corresponding" with "those"]

The CN:SN ratio (For example, one Service Node manages 288 Compute Nodes) needs to be carefully designed according to the networking for maximizing the utilization of network bandwidth and avoiding the congestion in the mean time.

[MG: I would suggest the paragraph above to be: The Compute Node to Service Node ratio needs to be carefully designed to maximize the utilization of network bandwidth and to avoid the congestion. ]

Simple and fast provisioning with diskless mode

In most of HPC sites, Compute Nodes are expected to be stateless. xCAT supports diskless installation, which enables the great simplicity and flexibility to manage both the [MG instead of "to manage" above, use "in managing"] software stack ( Including out-of-box Nvidia CUDA and Mellonox OFED ) on Compute Nodes and its deployment lifecycle with less efforts while deploying, upgrading and mainteaning.

[MG: "mainteaning" above should be "maintaining"]

[MG: I do not think the end of the sentence is needed "with less efforts while deploying, upgrading and mainteaning." The sentence already says in the beginning "which enables the great simplicity and flexibility"]

End-2-end infrastructure discovery

The hierarchy topology is complicated and taken administrator too much effort for deployment in Day-0. [MG: "taken" should be "takes"]

[MG: "for deployment in Day-0" better to say "to deploy on Day-0"]

xCAT supports rich of discovery capabilites and ONIE switch provisioning, it relaxes the life with the simplest way: connect it, describe it, then discover it. You could iterate the process when setting up a site with partial delivery.

[MG: "rich of" should be "rich set of"] [MG: "capabilites" should be "capabilities"] [MG: instead of "it relaxes the life with the simplest way: connect it, describe it, then discover it. You could iterate the process when setting up a site with partial delivery. " better to say "it simplifies the process by iterating over the following steps: connect, describe, then discover" . ]

Zero-touch fault server replacement

Inevitably, there might be some fault servers detected. It is a simple task in xCAT to replace the fault servers. You just need to put them off the rack, and then put the new ones up and powering on. After that, xCAT will take care of the node provisioning and information refreshing for you.

[MG: Replace "might" with "will".] [MG: I think it is better to use "failed servers" instead of "fault servers"] [MG: Replace "You just need to put them off the rack, and then put the new ones up and powering on." with "Just replace the failed server with the new one and power it on"] [MG: Would a user need to update the MAC address in the old node definition ?]

robin2008 commented 5 years ago

Use case 2 - AI

Deep Learning requires lots of computational resources to process analytics on large amounts of data, xCAT could be used to manage and deploy the Deep Learning environment.

Manage Deep Learning Elements

The cognitive computing environment requires to deploy servers with GPU and install deep learning frameworks and libaries. It is a burdun for data scientist to manage such an environment as there are lots of dependent pieces in differenet sources, like RPM, Conda and Python, etc. [MG: " libaries" -> " libraries", "burdun" -> "burden", "differenet" -> "different"] [MG: better to say "from different sources"]

With xCAT, you don’t have to worry about it. We can help you to mirror and configure those repositories for all of the dependent pieces, plus the NVIDIA hardware drivers, CUDA (parallel computing platform API) Toolkit, and NCCL (Collective Communications Library). After that, you could have an offline central repository serving for the whole deep learning cluster. [MG: instead of "With xCAT, you don’t have to worry about it. We can help..." better to say "xCAT can help...]

In addition, integrated with PowerAI, xCAT could support the enterprise grade deep learning solution based on IBM® Power Systems™ servers.

Simple deployment of deep learning environment

With the powerful bare-metal provisioning and flexibile diskful osimage definition, xCAT lets you deploy the deep learning clusters in minutes. Everything is automatic, spend your time in developing instead of deploying. [MG: "flexibile" -> "flexible"] [MG: better to say "automated" instead of "automatic"]

With xcat-inventory, you can source control your environments into Git repository. And it is possible for you to take risks and try new things in the testing and agile development without worrying about the recovery.

Scalability

Although deep learning environment is not so large at now time, a single management node is enough. But xCAT still lets you scale it beyond a single server, and quickly scale to a whole cluster. [MG: Instead of "at now time" better to say "today"]

robin2008 commented 5 years ago

Use case 3 - Cloud

Besides HPC cluster used for production, many HPC customers are still requiring a development environment on-premise for testing and agile developing. Virtualization environment is often used for such case as setting up a bare-metal cluster takes considerable time and effort.

Deployment of Virtualization infrastructure

xCAT supports deploy different kinds of virtualization infrastructures: Redhat RHV(KVM), IBM PowerKVM and Vmware ESXi. [MG: instead of "deploy" better to say "deployment of"]

You can easily deploy those hypervisors on bare metals, and create virtual machine instances with xCAT. In addition, you can provision those virtual machines in the same way as xCAT provisions physical machines. [MG: Instead of "bare metals" better to say "bare metal servers"]

On-demand scale in/out

xCAT supports to repurpose free HPC servers into virtualization environment with fast re-provisioning, and move it back again when HPC workloads require on schedule. [MG: instead of " to repurpose free " better to say " re-purposing of unused"] This make the resource utilization be improved as more as possible, and offer the underlying infrastructure software defined capability. [MG: instead of "make the resource utilization be improved as more as possible" better to say "improves the resource utilization"] [MG: "offer" should be "offers"]

RESTful API

And xCAT supports RESTful APIs, in that you can develop your own self-service portal. [MG: instead of "in that you can develop.." better to say "to help with development of..."]

robin2008 commented 5 years ago

In homepage, only need a outlines there, and give an refer URL to the real case page.

gurevichmark commented 5 years ago

@robin2008 I added some comments for "Use case 1"

gurevichmark commented 5 years ago

@robin2008 I added some comments for "Use case 2" and "Use case 3"

daniceexi commented 5 years ago

@robin2008 The three scenarios look good that covered the HPC and hot areas AI and Cloud.

bybai commented 5 years ago

The real case page can use the same "navigator bar" and "background" parts with the download.html.

robin2008 commented 5 years ago

A PR is created based on the updated content https://github.com/xcat2/xcat2.github.io/pull/9/files

And @bybai , I think there should be more looking refine later.