tinkerbell / hegel

Instance Metadata Service
https://tinkerbell.org
Apache License 2.0
97 stars 32 forks source link

HTTP Handlers for EC2 metadata seem incorrect #61

Closed micahhausler closed 2 years ago

micahhausler commented 3 years ago

Expected Behaviour

I don't have a full list of the broken paths, but a number of the queries in http-server/http_handlers.go: ec2Filters are invalid.

For example: { "/meta-data/plan": ".metadata.instance.plan" } where github.com/tinkerbell/tink/protos/packet/packet.proto doesn't have a corresponding field.

Possible Solution

Steps to Reproduce (for bugs)

  1. Boot and SSH into a tink worker
  2. curl -s $HEGEL_URL:50061/2009-04-04/meta-data/plan

Context

Its hard to know other than trial and error what metadata fields are supported and will work if I want to use something like CloudInit

displague commented 3 years ago

I've found that cloud-init query -a is missing some keys. When running cloud-init query -f '{{ ds.meta_data.operating_system.version }}', I think we can see evidence of a problem.

WARNING: Could not render jinja template variables in file 'query commandline': 'version'
CI_MISSING_JINJA_VAR/version

As curl -D- https://.../2009-04-04/meta-data/ and cloud-init query -l ds.meta_data show, the dictionary and array paths (public-keys/, operating-system/) do not end with /, as they do in other cloud environments.

This / character seems to hint to cloud-init that directory traversal is possible. I'm not entirely convinced that this is the only hint that is needed (headers, mime-type?), or that this is the authoritative hint. I haven't been able to identify spec documentation or pinpoint the cloud-init code that defines this behavior.

displague commented 3 years ago

Per https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html#instance-metadata-ex-1 and https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html, some additional concerns:

/latest should be an alias for 2009-04-04 (until newer versions are supported) / should return ["2009-04-04", "metadata", "latest"]

Comparing the latest format to what is returned by Hegel (ignoring fields that Hegel presents not in the Ec2 spec):

chrisdoherty4 commented 2 years ago

@displague

would this represent a Tinkerbell workflow?

Probably not. An AMI is quite Amazon specific. Trying to give it semantics beyond what Amazon defines could be misleading.

chrisdoherty4 commented 2 years ago

I looked in to the existing implementation and compared it with Amazon EC2 and Equinix Metal metadata. TL;DR, (1) we aren't serving EC2 metadata; (2) we're missing an Equinix metadata path (/metadata).

EC2 metadata (2009-04-04)

/meta-data/ami-id
/meta-data/ami-launch-index
/meta-data/ami-manifest-path
/meta-data/ancestor-ami-ids
/meta-data/block-device-mapping/ami
/meta-data/block-device-mapping/ebs<N>
/meta-data/block-device-mapping/ephemeral<N>
/meta-data/block-device-mapping/root
/meta-data/block-device-mapping/swap
/meta-data/hostname
/meta-data/instance-action
/meta-data/instance-type
/meta-data/kernel-id
/meta-data/local-hostname
/meta-data/local-ipv4
/meta-data/mac
/meta-data/network/interfaces/macs/{mac}/local-hostname
/meta-data/placement/availability-zone
/meta-data/product-codes
/meta-data/public-hostname
/meta-data/public-ipv4
/meta-data/public-keys/0/openssh-key
/meta-data/ramdisk-id
/meta-data/reservation-id
/meta-data/security-groups

Equinix metadata

# Full metadata object
/metadata

# Individual metadata available from 2009-04-04 base path
/meta-data/instance-id
/meta-data/hostname
/meta-data/iqn
/meta-data/plan
/meta-data/facility
/meta-data/tags
/meta-data/operating-system
/meta-data/public-keys
/meta-data/public-ipv4
/meta-data/public-ipv6
/meta-data/local-ipv4
# /metadata JSON response
{
    "id": "2885032e-61a8-4786-bd26-7b2e2e6ba1ea",
    "hostname": "metadata",
    "iqn": "iqn.2017-11.net.equinix:device.2885032e",
    "operating_system": {
        "slug": "ubuntu_20_04",
        "distro": "ubuntu",
        "version": "20.04",
        "license_activation": {
            "state": "unlicensed"
        }
    },
    "plan": "c3.small.x86",
    "class": "c3.small.x86",
    "facility": "ewr1",
    "tags": [],
    "ssh_keys": [
        "ssh-rsa AAAAB3Nza............."
    ],
    "storage": {
        "disks": []
        "raid": []
        "filesystems": []
    },
    "network": {
        "bonding": {
            "mode": 4
        },
        "interfaces": [{
            "name": "p1p1",
            "mac": "0c:c4:7a:e1:3d:d0",
            "bond": "bond0"
        }],
        "addresses": [{
            "id": "63f24352-0997-4f65-babd-6f8c9d048568",
            "address_family": 4,
            "netmask": "255.255.255.254",
            "created_at": "2017-11-04T17:03:20Z",
            "public": true,
            "cidr": 31,
            "management": true,
            "enabled": true,
            "network": "147.75.104.32",
            "address": "147.75.104.33",
            "gateway": "147.75.104.32",
            "parent_block": {
                "network": "147.75.104.32",
                "netmask": "255.255.255.254",
                "cidr": 31,
                "href": "/ips/6c2d45a7-df8b-451e-9d6d-2a1b5476a9d0"
            }
        }],
        "spot": {},
        "volumes": [],
        "api_url": "https://metadata.platformequinix.com/metal",
        "phone_home_url": "http://tinkerbell.ewr1.console.equinix.com/metal/phone-home",
        "user_state_url": "http://tinkerbell.ewr1.console.equinix.com/metal/events"
    }
}

Note the /metadata endpoint that serves the full JSON object doesn't actually feature in Hegel out of the box so we don't seem to be providing an Equinix implementation. The keys in the /meta-data listing may also expand further, Equinix docs don't clarify.

Filters

var ec2Filters = map[string]string{
    "":                                    `"meta-data", "user-data"`, // base path
    "/user-data":                          ".metadata.userdata",
    "/meta-data":                          `["instance-id", "hostname", "local-hostname", "iqn", "plan", "facility", "tags", "operating-system", "public-keys", "public-ipv4", "public-ipv6", "local-ipv4"] + (if .metadata.instance.spot != null then ["spot"] else [] end) | sort | .[]`,
    "/meta-data/instance-id":              ".metadata.instance.id",
    "/meta-data/hostname":                 ".metadata.instance.hostname",
    "/meta-data/local-hostname":           ".metadata.instance.hostname",
    "/meta-data/iqn":                      ".metadata.instance.iqn",
    "/meta-data/plan":                     ".metadata.instance.plan",
    "/meta-data/facility":                 ".metadata.instance.facility",
    "/meta-data/tags":                     ".metadata.instance.tags[]?",
    "/meta-data/operating-system":         `["slug", "distro", "version", "license_activation", "image_tag"] | sort | .[]`,
    "/meta-data/operating-system/slug":    ".metadata.instance.operating_system.slug",
    "/meta-data/operating-system/distro":  ".metadata.instance.operating_system.distro",
    "/meta-data/operating-system/version": ".metadata.instance.operating_system.version",
    "/meta-data/operating-system/license_activation":       `"state"`,
    "/meta-data/operating-system/license_activation/state": ".metadata.instance.operating_system.license_activation.state",
    "/meta-data/operating-system/image_tag":                ".metadata.instance.operating_system.image_tag",
    "/meta-data/public-keys":                               ".metadata.instance.ssh_keys[]?",
    "/meta-data/spot":                                      `"termination-time"`,
    "/meta-data/spot/termination-time":                     ".metadata.instance.spot.termination_time",
    "/meta-data/public-ipv4":                               ".metadata.instance.network.addresses[]? | select(.address_family == 4 and .public == true) | .address",
    "/meta-data/public-ipv6":                               ".metadata.instance.network.addresses[]? | select(.address_family == 6 and .public == true) | .address",
    "/meta-data/local-ipv4":                                ".metadata.instance.network.addresses[]? | select(.address_family == 4 and .public == false) | .address",
}

Clearly, the paths and filters correlate to Equinix metal, not EC2, and we don't expose a /metadata endpoint like the Equinix docs suggest should exist. I'm not sure what the purpose of the 2009-04-04 is in the Equinix metadata standard but it might be a source of confusion given its the same date as one of the EC2 versions.

There are only 4 items that overlap between Equinix and EC2: hostname, public-keys and public-ipv4, local-ipv4 so even though the code implies its EC2 metadata, its not. Given cloud init is wide spread and supports EC2 it would be useful to implement this properly.

I think this needs redesigning. Assuming we want to support multiple APIs, EC2 and Equinix for example, it would be nice if we could configure which API and data source is desired on launch (I've talked about this in a separate issue). The 'data model' does provide the data source configuration albeit in a somewhat convoluted way today.

displague commented 2 years ago

Clearly, the paths and filters correlate to Equinix metal, not EC2, and we don't expose a /metadata endpoint like the Equinix docs suggest should exist. I'm not sure what the purpose of the 2009-04-04 is in the Equinix metadata standard but it might be a source of confusion given it's the same date as one of the EC2 versions.

The 2009-04-04 compatibility, specifically the fields you called out, offers a path well supported by historical and current versions of cloud-init and similar tools. The pattern is utilized in Equinix Metal OS provisioning. I'd leave it to @nshalman to say whether or not this is facilitated by Hegel. I believe a predecessor is in use and the intention was to adopt Hegel in the future (with backward compatibility for the current service). This may have changed.

Go templating offers a way to offer bespoke metadata URL mapping and I wonder if this should be a matter of user configuration (with standardized default templates).

chrisdoherty4 commented 2 years ago

We discussed this during a community meeting and have opted to replace existing APIs with a Hegel metadata API that will be integrated with cloud-init.