vmware-tanzu-labs / tanzu-validated-solutions

Tanzu Validated Solutions is a publicly-available repository for authoring, updating, and publishing reference architecture and validated solution materials for docs.vmware.com. This repository provides the feedback mechanism for continuous improvement of these assets.
Other
70 stars 71 forks source link

Feedback from the field: Ref Arch on AWS needs some discussion #67

Closed bcdurden closed 2 years ago

bcdurden commented 2 years ago

Referring to this document here: https://github.com/vmware-tanzu-labs/tanzu-validated-solutions/blob/main/src/reference-designs/tko-on-aws.md

A few points:

  1. I think it's well known at this point that this document does not capture reality for most of the Federal-side TKG deployments. TMC, Observability, and anything SaaS like service mesh are usually a no-go. I think it's worth presenting recommendations around those realities.
  2. Another issue is quasi-competitor products being used to deliver many of the same services that TKG delivers. While we can't call out those services specifically maybe we can point out 'TKG can run those platforms too and here's what it might look like when metrics/logging is managed by a different vendor'.
  3. Without observability there is a large gap around log aggregation from the base TKG offerings. While we've got a few PoCs floating around using Thanos and Loki, they're obviously not supported from a licensing perspective. This may be more for the R&D team, but we really need a log aggregation solution that doesn't involve SaaS in order to better serve our customers. Otherwise they will resort to using unsupported non-VMware products (and the ecosystems around those tools)
  4. The VPC and foundation diagrams below don't seem to reflect any particular environment that I've seen in production yet. JSF is using a migration-friendly path where TAS-foundation and estate concepts are applied to the TKG environment. The classic control-plane, sandbox, dev, and prod foundations in TAS-land that formally had their own VPCs are replicated for TKG with the same naming conventions. Unlike TAS however, these foundations can be capable of deploying N clusters; so we define kind of a cluster template for each foundation that can be cloned when building a new single-tenant cluster in a specific foundation.
  5. Gitops seems to be missing here. While I realize it is a larger topic that may be out of scope here, I think it would be useful to tell the SE/SA doing this work what tools they have at their disposal out of the box and maybe link to some docs that explain those tools. For instance, TKG 1.4 operates under the hood using kapp-controller as the main gitops orchestrator. It is a Carvel tool and works VERY well with existing Carvel tools being used elsewhere (ytt, kbld, imgpkg, and vendir).
sendjainabhi commented 2 years ago

see for following items - 1 - Please provide more specific details on item 1. 2 - RA talk about Log Forwarding using vrealize /cloud watch etc. check this section in RA 3 RA observability section talk about Saas(TO) and on premise (grafana/Prometheus) both. 4 TKO-AWS present design for production , not specific to dev/qa/prod. *5 Deployment guide have tanzu cli setup section talk about carvel tools

sendjainabhi commented 2 years ago

@bcdurden - for item 1 you can connect with Rahul . Air gapped RA versions are being handled separately. Rest all items have been answered.