protomaps / go-pmtiles

Single-file executable tool for working with PMTiles archives
https://docs.protomaps.com/pmtiles/cli
BSD 3-Clause "New" or "Revised" License
373 stars 51 forks source link

Caddy plugin #85

Closed bdon closed 1 year ago

bdon commented 1 year ago

Initial implementation here: https://github.com/protomaps/go-pmtiles/blob/main/caddy/pmtiles_proxy.go

Outstanding tasks:

here is an example Caddyfile:

localhost:2019 {
  route /* {
    pmtiles_proxy {
      bucket https://example.com
      cache_size 64
    }
  }
}

A request for https://localhost:2019/foo/0/0/0.mvt will fetch https://example.com/foo.pmtiles. Great!

mholt commented 1 year ago

I assume this is because my plugin is in the package 'pmtiles_proxy', a subdirectory of this repository; the go.mod is at the root of the repo, and is the module also for the CLI program.

Almost correct, the error is:

import "github.com/protomaps/go-pmtiles" is a program, not an importable package

because the module's main package was registered, instead of the package actually containing a Caddy plugin. Instead of registering github.com/protomaps/go-pmtiles, register github.com/protomaps/go-pmtiles/caddy.

how should I re-organize this repo?

I think you can leave it as-is, just claim the package within your module instead :+1:

This only works because we have route /. If we did route /prefix/, we need to pass this information into the plugin so it can strip that off when it makes a fetch to the pmtiles file. Is this configuration state something I can access in Provision via caddy.Context? This overlaps with https://github.com/protomaps/go-pmtiles/pull/61

You can use handle_path /prefix/* to match a prefix and strip it at the same time. Is that what you're asking to do?

If you use handle or handle_path you'll need to specify the order of your handlers; you can either use route { ... } inside, or you can specify this in your global options:

{
   order pmtiles_proxy before reverse_proxy
}

(for example)

We also need the localhost:2019 information in order to generate correct TileJSON. I hardcoded this to https://example.com for now. Is this something we can access in Provision or do we need to trust the host of the request struct?

I guess it depends what it's used for -- why is the host needed in the first place? Most URIs can be root-relative (i.e. relative to the host, by specifying only the path) -- letting the client piece things together is often more accurate / less tedious.

bdon commented 1 year ago

why is the host needed in the first place?

We are at the mercy of existing specifications and client behavior:

https://github.com/mapbox/tilejson-spec/tree/master/3.0.0#32-tiles

All endpoint urls MUST be absolute.

My concern with trusting the client request hostname: is it possible for a cache in front of Caddy to be poisoned by some spoofed value for the header? Since that gets into the JSON body of the response.

A fallback is to require the hostname to be duplicated in the configuration, which doesn't seem that horrible

localhost:2019 {
  route /* {
    pmtiles_proxy {
      bucket https://example.com
      cache_size 64
      public_hostname https://localhost:2019
    }
  }
}
mholt commented 1 year ago

My concern with trusting the client request hostname: is it possible for a cache in front of Caddy to be poisoned by some spoofed value for the header? Since that gets into the JSON body of the response.

That's hard for me to answer off the top of my head, but I will say that indeed the Host header can be anything the client wants it to be (it depends on how the request was formed and how it was dispatched; some libraries/scripts use the Host header to craft a URL, or vice-versa, and they also do their own DNS resolution -- but if you have the ability to simply direct an HTTP request through a TCP connection to an arbitrary host you connected to, then the Host header can be whatever).

If that only affects the one client then they'd just be shooting themselves in the foot and I don't see that as your problem.

If you do implement caching that serves many clients, then that's something to be aware of.

But yeah, if you just specify the hostname twice in your config that could work.

bdon commented 1 year ago

I'm not having any luck getting the caddy site's Register Package to pick up the subdirectory:

unable to scan modules in package github.com/protomaps/go-pmtiles/caddy

Please include this error ID if reporting:
39e14f43-7722-4c8f-b08b-473f236a8125

It seems based on the wording that it is looking for a go module, which only exists one level up.

mholt commented 1 year ago

Thanks for the error ID. The detail is:

go: module github.com/protomaps/go-pmtiles@upgrade found (v1.10.0), but does not contain package github.com/protomaps/go-pmtiles/caddy

I'm guessing it is because the package name is caddy but the package is pmtiles_proxy.

bdon commented 1 year ago

OK, it seems to be live! thanks! Will be testing it out today. I had to pass a specific tag v1.10.1 as latest wasn't working for me.

https://caddyserver.com/download?package=github.com%2Fprotomaps%2Fgo-pmtiles%2Fcaddy

Now to figure out how to make the documentation strings on the Caddy download page show up...

mholt commented 1 year ago

Awesome!! Congrats.

The docs come from godoc comments. So make sure to write a comment over the package declaration. Also make sure to document exported types and fields with comments.

I'm super excited about this!

bdon commented 1 year ago

I have this running in staging! Working through some issues but very confident they're problems in go-pmtiles.

I noticed I had trouble picking up a new tagged version on https://caddyserver.com/download - I tried to download a new build, and assumed that it was cached by module/OS/architecture since it generated instantly. I even un-registered my plugin and added it again, specifying the new tag, but it still returned the same artifact for the same version. Possibly being aggressively cached by the build process? Makes sense but would be nice to get an idea of when those fall out of the cache. Anyways, closing this for now and will move to separate issues as I add docs, etc.

mholt commented 1 year ago

Ah yeah, we do aggressively cache builds. In theory it should trigger a cache miss if you specify a version that hasn't been downloaded before. We are aiming to improve this but haven't gotten around to it yet. (So much to do!)

Let me know if there's any other questions :)

bdon commented 1 year ago

I started writing up some docs at https://docs.protomaps.com/deploy/server (this is the new open source docs site, I'm still in the process of porting over the other pages!)

mholt commented 1 year ago

Looks good!!