Improve publicly accessible checks to include targets of ELBs

Fennerr commented 9 months ago

New feature motivation

I was looking at some prowler output, and at the "RDS instance not publicly accessible" check in particular I think there are some improvements we can make.

Whilst the RDS instance itself might not be directly accessible from the internet, it might be behind an ELB with no auth which is sitting on the internet

So when it comes to checks for resources that live in a vpc that are "ensure X is not publicly accessible" we can make sure that there isn't an ELB for which the resource is part of a target group

Solution Proposed

Enumerate ELBs, check for public ones, and then get their target groups When doing checks for public resources within the other services, have some util function that can be called to check if the resource is a target for the ELB. It would also need to be able to handle IP addresses if the resource has an IP address associated to it (EKS node/ec2 instance/rds instance come to mind).

Implement checks that see if the resource is a target for an elb (could be a new check, that is seperate to X_not_publicly_accessible - although based off of this check name it could form part of it. Alternatively make 2 checks: X_not_directly_publicly_accessible or X_not_publicly_accessible_via_elb, or something along those lines)

One of the things I noticed is that awslambda_function_not_publicly_accessible only checks for IAM perms, not if its a target of an ALB.

Here are some of the types of resources which this type of check would apply to (I asked chatgpt what resources can be targets for ELBs. Im not sure if all of them will be applicable to what prowler is currently checking for, like the AWS Direct Connect, VPN, or peered VPCs)

EC2 Instances: Elastic Load Balancers can distribute traffic to EC2 instances, which are virtual servers in Amazon's Elastic Compute Cloud (EC2).
ECS Tasks: For applications running in Amazon Elastic Container Service (ECS), ELBs can route traffic to ECS tasks, which are Docker containers running on EC2 instances or AWS Fargate.
Lambda Functions: Application Load Balancers (ALBs) can invoke AWS Lambda functions based on HTTP(S) requests, enabling serverless architectures.
IP Addresses: ELBs can route traffic to specific IP addresses. This is useful for targeting resources that are hosted in on-premises data centers connected via AWS Direct Connect or a VPN, or in peered VPCs.
Containers: For containerized applications, ELBs can directly route traffic to Docker containers managed by ECS or Kubernetes on AWS EKS.
Auto Scaling Groups: ELBs work seamlessly with Auto Scaling Groups to adjust the amount of compute resources in response to incoming application traffic.
Amazon EC2 Spot Fleets: ELBs can distribute traffic across Spot Fleet instances, which are collections of EC2 spot instances.

Describe alternatives you've considered

None

Additional context

No response

abant07 commented 7 months ago

Hey is this an issue that still need someone to solve, if so, I can give it a shot

jfagoagas commented 7 months ago

Hi @Fennerr @abant07, I like the idea to check also ALBs since this will complement each check to truly know if they are publicly exposed or not. Our way to do this is to add more logic to the publicly exposed checks. Recently we have updated the rds_instance_no_public_access to verify if the Security Group attached has publicly exposed ports, thus accessible from the internet 0.0.0.0/0, as you can see here https://github.com/prowler-cloud/prowler/pull/3341.

I think for the improvement you want to do, we'll need to retrieve the following:

ALB/ELBs
Target Groups for each one
Security Groups for each one -- currently retrieved in the EC2 service and analysed to see if they are publicly accesible
Then, in each check, recover the IP address for the resource and see if it is attached to a Target Group
Once here, check the ALB/ELB Security Group

A note about development, it is important to store all of the above information using Python dictionaries since we will do several lookups by IPs or ARNs and will be much faster than looping all over lists.

Does it make sense to you?

CC: @sergargar

abant07 commented 7 months ago

Cool, yeah I think I am able to understand, is everything in the readME good enough as well for me to reproduce any environments in AWS?

Thanks

Fennerr commented 7 months ago

Im not sure about the docs coverage, I have just learnt the code base over time, but I believe it is rather comprehensive.

What you will want to do is look at what information the clients pull. Inside the service there will be a [servicename]_client.py which collects data from AWS when it is instantiated. The ec2 client will have all the security groups, and network interfaces.

Then look into handling each of the target types for an ELB. For example, one of the types is an IP address. To determine what that IP address is associated with you will need to check out the ENIs.

So you need to be able to match the network interfaces to their associated resources. For EC2 instances, the NetworkInterfaceId parameter is returned when doing the describe_instances call, but it is not returned for other resources like EFS File System Mount Points and ELBs. It also doesnt appear to be there for RDS (but still need to confirm this).

I recommend placing breakpoints at the last call in the init of the service's client to see what information is available.

To be able to determine the ENIs associated with a resources for resource types that don't tell you their ENI ID when describing them (such as for ELBs), I have done this in a script by searching through the descriptions of the network interfaces. For example:

########################### ELBs
elb_client = s.client("elb")
elbv2 =  s.client("elbv2")

for elbs in [elb_client.describe_load_balancers()["LoadBalancerDescriptions"], elbv2.describe_load_balancers()["LoadBalancers"]]:
    for elb in elbs:
        attached_enis = [v["NetworkInterfaceId"] for k,v in enis.items() if elb["LoadBalancerName"] in v["Description"]]
        for attached_eni in attached_enis:
            enis.attach_resource(eni_id = attached_eni, resource_type='ELB', resource_data=elb)

vpces = ec2.describe_vpc_endpoints()['VpcEndpoints']
for vpce in vpces:
    attached_enis = [x for x in enis if x in vpce["NetworkInterfaceIds"]]
    # print(attached_enis)
    for attached_eni in attached_enis:
            enis.attach_resource(eni_id = attached_eni, resource_type='VPCE', resource_data=vpce)

(I have defined some ENI classes so that I can standardized how I "attached a resource" to it, but I only started with the script, and didnt finish it off)

You will need to import the ec2 client into the other services when writing the checks. For example, this lambda check has to import the cloudtrail client

Lastly, the logic should ideally also handle resolving ELBs/NLBs/ALBs that point to one-or-more ELBs/NLBs/ALBs before pointing to an actual resource (RDS, EC2, etc)

Fennerr commented 7 months ago

I see the VPC Endpoints in the code I provided is an example of a resource that provides the NetworkInterfaceIds - not all resources provide this (Route53 resolvers, ELBs, EFS, and RDS are example I have come across that do not - althought I still need to double check RDS, im just basing this off of the documentation for the describe db instances call)

Fennerr commented 7 months ago

RDS instances have the "endpoint" key when describing them, which provides a DNS name. Im thinking of using this code to resolve it:

# Extract the endpoint (DNS name) from the response
# This would change to getting it from the rds client
endpoint = response['DBInstances'][0]['Endpoint']['Address']

# Use socket to resolve the endpoint's DNS name to an IP address
ip_address = socket.gethostbyname(endpoint)

I took a look at the RDS ENIs in EC2, and they all have this description RDSNetworkInterface, which cant be used to determine which RDS instance it actually belongs to

abant07 commented 7 months ago

Ok sounds good, I will take a shot at it this week.

Thanks

abant07 commented 7 months ago

Hey!

Sorry it has been a while, I hadn't realized I would be busy the last few weeks studying for finals. However, this week, I am on break, and I should be able to get this issue done.

Although, I have experimented with the ReadMe for setting up the environment and running all the current security checks for AWS, and it is working fine.

I will keep you guys posted throughout the week if I run into any troubles.

Thanks

abant07 commented 7 months ago

Hey @jfagoagas @Fennerr ,

So I started looking deeply into how I can solve this, however, I want to make sure I have the right idea before I go too far.

This issue is asking that although a resource may not be directly accessible from the internet, we want to provide a security check that checks to see if any resources that a user has active in their account is not behind a public ELB or ALB. So to do this, the proposed solution is:

I first check to see what ALBs and ELBs are being used in a user's account. If no ALBs/ELBs, the check passes. If there are load balancers, then, I need to see if these ALBs/ELBs are public, so I need to check their security groups. If no public ALBs/ELBS, the check passes. If there are any public ALBs/ELBs, then I need to check its target groups. Then, I need to compare some sort of ID or IP address among all the active resources in a user's account(since I am doing this for select resources like EC2, RDS, Lambda, etc. I would need to check all active EC2, RDS dbs, lambda funcs, etc) to see if they match as one of the resources a part of the target group from the ALB/ELB. If the ID/IP address happens to match to the target group resources, then I need to flag the check as failed, if it doesn't show up, then flag the check as passed.

Is this all correct?

Thanks

abant07 commented 7 months ago

Also, when I am checking for public ELBs/ALBs, when I want to get the target groups, would they returned back as a list data structure?, and then when I want to get the resources that are in each target group, would that be a list data structure as well?

abant07 commented 5 months ago

Hello!

I have finished this issue, please let me know if anything is incorrectly written or if I need to add more test cases for the checks.

Thanks, Amogh

prowler-cloud / prowler