litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.44k stars 698 forks source link

Update ADOPTERS.md with Litmus usage details #2191

Open ksatchit opened 4 years ago

ksatchit commented 4 years ago

The LitmusChaos Community is working towards increasing adoption of chaos engineering practices within the Kubernetes world & is focused on collaboration with other cloud-native projects. One of the ways of tracking the project's reach is via an ADOPTERS list. The purpose of this issue is to get a list of organizations/individuals who are using Litmus to power their chaosengineering practice and also share broadly their usecases & reasons for choosing Litmus.

Please comment on this issue with details like:

This information will be used to create a PR on the ADOPTERS.md file, which you can approve. Alternatively, feel free to create a PR and reference this issue !

divya-mohan0209 commented 4 years ago
barkardk commented 4 years ago

I am using LitmusChaos as a part of our QA cycle at the moment to verify resiliency and catch bugs. For now it is only used in AWS EKS and Ec2 instances , we are expanding it to usage in Azure hopefully soon. Litmus looked solid, easy to implement and most of all easy to customise. gitHub id xkbarkar, Netapp Inc

keerthisagar40 commented 4 years ago
ishantanu commented 4 years ago
xunholy commented 4 years ago

Applications/Workloads or Infra that are being subjected to chaos by Litmus:

Why was Litmus chosen & how it is helping you (a brief description on the usecase):

Are you using it as part of devtest, CI/CD, in staging/pre-prod/prod or other:

If you would like your name (as standalone user) or organization name to be added to the Adopters.md, please provide a preferred contact handle like github id, twitter id, linkedin id, website etc:

olegch commented 3 years ago

Applications/Workloads or Infra that are being subjected to chaos by Litmus

Why was Litmus chosen & how it is helping you (a brief description on the usecase)

Are you using it as part of devtest, CI/CD, in staging/pre-prod/prod or other

If you would like your name (as standalone user) or organization name to be added to the Adopters.md, please provide a preferred contact handle like github id, twitter id, linkedin id, website etc.

niebomin commented 3 years ago

Please add VMware as adopter. Will add more description later. Use case is Chaos Engineering in CD.

asibece commented 3 years ago

Why do we use Litmus. To ensure resilience, detect bugs and test rollouts. We are still in the early stages.

How do we use Litmus. Litmus is being used as part of dev/test cycles to catch bugs & verify resiliency.

Benefits in using Litmus. The litmus is easy to use and extend/develop based on custom requirements and well-supported open source tool.

SomeshJoshi19 commented 3 years ago

Please consider the shared file here as adopter for Pravega to acknowledge usage of Litmus Chaos, thanks. Pravega.md

shilpa7252 commented 3 years ago

Why do we use Litmus. To inject network related faults on kubernetes environment

How do we use Litmus. Litmus is being used as part of QE testing

Benefits in using Litmus. The litmus is easy to use and to inject faults in environment

nikhil-neu commented 3 years ago

We are using litmus chaos to inject faults in our aks environments. Before arriving at litmus we explored other tools , but found litmus to be the most well rounded one and the one that aligned closest to the principles of chaos We are using litmus in our pre prod environments in the ci cd stage as a gate for releases

The chaos gated deployments make use of the in-built git ops integration in litmus

https://www.neudesic.com/

chris-cmsoft commented 3 years ago

We have used Litmus to build out Chaos Engineering platforms with some of our large E-Commerce customers to improve resilience for big sales periods such as Black Friday.

We looked into quite a few tools, and Litmus provided us with the flexibility we needed, whilst bootstrapping many of the components we would have to write ourselves.

We also used Litmus Chaos experiments when discussing some of our customer's architecture constraints, and showing them real world cases of how to make Kubernetes more resilient.

The Litmus community and *product have been a great addition to our tool stack, and provided many benefits for us.

bbarin commented 2 years ago

We have been using Litmus 2.X at iFood for a couple of months, replacing chaostoolkit as it provides a wider range of experiments out-of-the-box. We've started using it to validate the fallback mechanisms of critical services monthly. Right now, we are expanding its usage to go further and inject failures to drop access to databases, redis, Kafka and AWS services and learn from it and take some countermeasures to improve the critical services. I hope Litmus to become the de-facto tool to implement Chaos Engineering in a simple manner. Github: bbarin website: ifood.com.br

vadheraju commented 2 years ago

We at FIS Global, have been embarking on to larger SRE program to transform platform teams from purely operations focused to bring in SRE/Automation culture and mindset. As part of that larger effort, Chaos/Resiliency Engineering is identified as key program to improve stability and availability thus improve overall reliability of applications across organization and provide superior customer experience. We have chosen Litmus as a Chaos Engineering Tool because, It

Where we are using Litmus

vraton commented 2 years ago

In adidas, we started months ago with a new initiative about how to implement chaos engineering practices in order to provide the engineering teams a guide and tools about how to test the resilience of the applications through chaos engineering. With this goal in mind, we started to define some best practices and processes to be shared with our engineering team, and we started to evaluate a few tools.

After an evaluation of different tools, we decided to go ahead with Litmus Chaos. How are we using Litmus chaos:

eran-levy commented 2 years ago

We are utilizing Chaos Engineering for something else at the moment :) We found it very useful to bring our engineering confidence while responding to production incidents and train them on cloud native engineering practices, check out this article where I elaborate more on our workshop - https://www.infoq.com/articles/chaos-engineering-cloud-native/

jonathasb-cit commented 2 years ago

After an evaluation period of some Chaos Engineering tools, we chose Litmus because it is a more mature tool that would meet most of our needs. We are in the implementation, configuration, and process definition phase. AB-Inbev's BEES is a huge project that has hundreds of microservices, it has been a great challenge to adapt Litmus in this process, making customizations and counting on the help of the Litmus community to evolve the tool and thus achieve our goal of making it available to the teams. Some points that made us choose Litmus:

rutu-k commented 2 years ago

At InfraCloud, we are using Litmus to develop Resiliency Frameworks. Why do we use Litmus. To simulate various Chaos scenarios using fault injection templates provided by Litmus. Litmus also helps to incorporate custom fault templates developed using AWS SSM documents.

How do we use Litmus. Currently, we have tested with different kind of scenarios including faults like pod deletion, network latency, resource stressing, network partitioning in databases, and many more.

Benefits in using Litmus.

Company website: https://www.infracloud.io/ Company GitHub: https://github.com/infracloudio

tao12345666333 commented 1 year ago

We practice chaos engineering using Litmus in the Apache APISIX Ingress.

Litmus also helped us find hidden bugs.

Project website: https://apisix.apache.org/ This is the text version of my online sharing content. https://dev.to/apisix/building-a-more-robust-apache-apisix-ingress-controller-with-litmus-chaos-3ldn

abdiakhate commented 1 year ago

At Baobab Group, we use LitmusChaos to orchestrate chaos on Kubernetes to help developers and SREs find weaknesses in their application deployments.

We use it on QA and Preprod stages in order to see how the Workloads and AWS ressources behave in case of failure injection.

How do we use Litmus. We use it on our Kubernetes workloads like pod deletion or CPU hog and we plan to extend it on cloud services..

Benefits in using Litmus.

Company website: https://baobab.com/

prithvi1307 commented 9 months ago

User comment by IFS image (21)

safeercm commented 9 months ago

Flipkart is an adoptor of Litmus Chaos. In addition to using the core features, we have also built a VM chaos platform leveraging Litmus. The details are covered in this talk we gave at Chaos Carnival 2024 - Building a Chaos Platform for Virtual Machines with OpenSource Tools

MichaelMorrisEst commented 8 months ago

Why do we use Litmus. We are using Litmus at Ericsson to perform resilience testing of our applications and to gain an understanding of how they perform in failure scenarios

How do we use Litmus. We are using Litmus in pre production CI testing phase

Benefits in using Litmus. Litmus is easy to use and provides a good level of functionality with the included fault scenarios, whilst the architecture allows for easily deploying custom faults if required. It provides the means to easily test scenarios that would otherwise be difficult to test

smitthakkar96 commented 8 months ago

Within Delivery Hero, two of our entities, Hungerstation and PedidosYa, have been leveraging Litmus to enhance the resilience of their services. We use various faults offered by Litmus such Network Latency, Network Corruption etc. Using Litmus the teams have been able to test mechanisms such as circuit breaking, fallbacks, scaling behaviour, context timeouts etc. Building on this experience, we are currently developing an internal Chaos Engineering Platform, based on Litmus, as part of our Global Developer Platform initiative. This platform aims to standardize and elevate chaos engineering practices across all Delivery Hero verticals.

akria18 commented 8 months ago

At Talend, we are using Litmus 2.x and Litmus 3.x within our pipeline and for weekly checks. Litmus was the solution we chose to help us on our journey with chaos engineering.

How do we use Litmus?

Litmus is deployed in our environment to validate our observability/security stack and to help promote our builds before they go live into production. We use it within a weekly job that utilizes Litmus as a chaos controller, along with a custom-built tool that collects results after injected experiments and sends them to Slack in report form for better resilience improvements in our observability/security stack.

We have also started using it to validate our SLIs/SLOs and their runbooks. Additionally, we use it in our Jenkins pipeline when we want to promote builds to production after QA tests and validate that the new version supports newly injected turbulences, etc.

Benefits of using Litmus?

Litmus is a straightforward framework that provides multiple experiments and is easy to use by developers. It allows for the creation of specific chaos workflows depending on their needs.

alininja commented 6 months ago

Thank you for creating such a wonderful software 🙏

Here are the requested details:

@prithvi1307 , I hope this is okay. Please let me know if there's anything else I can provide.

Thanks again for making such an awesome product 🙇

dhruv5176 commented 6 months ago

Leveraging Litmus Chaos Engineering in Kubernetes Infrastructure

We have a Kubernetes-based infrastructure pivotal to our operations, where reliability and resilience are paramount. Recognizing the need for robust testing methodologies, we turned to Litmus Chaos Engineering to fortify our systems against potential failures and to ensure seamless operations even under adverse conditions.

Why Litmus: Litmus emerged as our tool of choice due to its comprehensive suite of chaos engineering capabilities tailored specifically for Kubernetes environments. Its versatility in orchestrating controlled chaos experiments aligns perfectly with our commitment to enhancing system reliability while maintaining agility.

Use Case and Implementation: We have seamlessly integrated Litmus Chaos Engineering into various stages of our development and deployment pipeline, spanning from development and testing to staging and production environments. Leveraging Litmus, we meticulously craft and execute chaos experiments, meticulously observing how our infrastructure behaves under stress, and ensuring it meets our predefined Service Level Objectives (SLOs) and Service Level Indicators (SLIs).

Achievements: Our journey with Litmus Chaos Engineering has been marked by significant milestones:

Successful deployment of Chaos Center and Litmus Delegate, empowering us with centralized chaos management capabilities. Establishment of secure access to Chaos Center through HTTPS, coupled with domain customization for enhanced usability. Implementation of WAF ACL to restrict access to Chaos Center, ensuring secure interactions. Integration of Azure SSO for streamlined user management and authentication. Seamless connectivity between Chaos Center and target nodes, facilitating efficient chaos experimentation. Execution of numerous successful experiments, validating the resilience and scalability of our infrastructure.

Next Steps: As we continue to harness the power of Litmus Chaos Engineering, we remain committed to expanding our chaos engineering initiatives, further refining our chaos experiments, and continually enhancing the resilience of our Kubernetes infrastructure.

Contact Information: Dhruv's LinkedIn profile.

We are excited about the possibilities that Litmus Chaos Engineering unlocks for us and look forward to sharing our insights and experiences with the community.

ledbruno commented 5 months ago

Nubank

Nubank is the world’s largest digital banking platform outside of Asia, serving over 100 million customers across Brazil, Mexico, and Colombia.

Applications/Workloads or Infra that are being subjected to chaos by Litmus

Why was Litmus chosen & how it is helping you (a brief description on the use case)

How do we use Litmus

Benefits of Litmus

bjoky commented 2 months ago

At Infor we have a resilience team for one of our products. In that team we are using Litmus as the main tool for chaos engineering. Some of our reasons for choosing it were that it is an open-source tool with an active community and that it runs in Kubernetes.

Litmus was our entry point into chaos engineering, and it provided us with a palette of possible experiments and types of failures to choose from. We use it to simulate failures on workloads in Kubernetes environments for development, testing and pre-production, but not yet in production environments.

So far, we have mainly used Litmus for “game day” style workshops. We gather the team working with a component and we run a few experiments together with them. But we have also started using it for running automated experiments in controlled environments and are also looking into integrating it in our CI/CD pipeline.

Our experience with running Litmus and chaos engineering workshops has in general been positive. Besides running the tool, we have also put emphasis on the preparation and follow-up phases of our chaos experiments. We have found that the discussions about resilience and chaos engineering is of value for the developers and helps create a culture of resilience that improves the quality of our product.

alicicek1 commented 2 months ago

Wingie Enuygun Company

Wingie Enuygun Company is a leading travel and technology company providing seamless travel solutions across various platforms.

Why do we use Litmus

We use Litmus to identify bottlenecks in our systems, detect issues early, and foresee potential errors. This allows us to take proactive measures and maintain the resilience and performance of our infrastructure.

How do we use Litmus

Litmus is integrated into our QA cycles, where it plays a crucial role in catching bugs and verifying the overall resilience of our systems.

Benefits in using Litmus

Litmus chaos experiments are straightforward to implement and can be easily customized or extended to meet our specific requirements, enabling us to effectively manage and optimize our systems at Wingie Enuygun.

Devendranathashok commented 2 months ago

Resilience is a key aspect in creating fault-tolerant environments, and leveraging tools like Litmus has been instrumental in automating resilience testing. Litmus has enabled us to simulate real-time chaos scenarios, allowing us to thoroughly verify the robustness of both our infrastructure and applications.

We began with a proof of concept (POC) on a playground cluster. While we explored other tools during this process, Litmus stood out significantly, not only in its capabilities but also due to its excellent user interface. Although we faced a few challenges during the initial setup of Litmus on OpenShift, the team provided timely support, helping us overcome these obstacles and successfully complete the POC.

Now, we've successfully deployed Litmus in a non-production cluster environment, and our SRE team is in the process of transitioning from manual chaos testing to automated chaos tests. This shift will enable us to schedule, automate, and efficiently track the outcomes of these tests, enhancing the resilience of our systems.

sidvijay18 commented 6 days ago

At PokerBaazi, we leverage Litmus Chaos to subject critical components of our infrastructure to controlled chaos experiments. These include:

  1. Microservices Infrastructure: Our backend is designed as a microservices architecture, running on Kubernetes. We conduct experiments on inter-service communication, API latencies, and service resilience during node failures or resource constraints.
  2. Load Balancers and Networking: We simulate disruptions in networking, such as packet drops or DNS failures, to ensure our applications maintain connectivity and continue serving users.
  3. Application Workloads: High-demand applications like our gaming engine and payment/promotions api's are put under stress to evaluate their fault tolerance and recovery mechanisms during peak loads or unexpected outages.

We chose Litmus Chaos for several compelling reasons:

  1. Kubernetes-Native Integration: Since our infrastructure is heavily Kubernetes-based, Litmus seamlessly integrates with our stack, making it a natural fit.
  2. Ease of Use and Open-Source: Litmus offers a user-friendly interface along with robust documentation, allowing our teams to adopt it quickly without steep learning curves.
  3. Custom Experiment Support: The ability to create tailored chaos experiments aligned with our specific workloads ensures we can target critical failure scenarios unique to our ecosystem.
  4. Community Support and Scalability: Being an open-source project with an active community, Litmus evolves rapidly, allowing us to leverage the latest chaos engineering methodologies and tools.

Litmus has been instrumental in identifying hidden weaknesses in our system, such as unexpected dependencies or cascading failures. This has enabled us to proactively address potential issues, enhance system resilience, and meet our uptime commitments.

We use Litmus Chaos in various environments to ensure robust testing at every stage of development:

  1. Development: Initial chaos experiments are conducted in isolated dev environments to identify basic resilience issues and ensure service fault tolerance during early-stage development.
  2. Staging/Pre-Production: In staging, we run more comprehensive chaos scenarios simulating real-world failures, such as pod crashes, resource exhaustion, or external API downtime, to ensure the production-like environment is resilient.
  3. Production: Selected, low-risk chaos experiments are conducted in production under strict supervision to verify real-time system robustness and validate SLAs in live conditions.

Litmus Chaos has transformed our approach to building and maintaining a highly resilient gaming platform, allowing us to deliver exceptional user experiences while preparing for the unexpected.