proxy-wasm / proxy-wasm-rust-sdk

WebAssembly for Proxies (Rust SDK)
Apache License 2.0
496 stars 97 forks source link

Query: Perform HTTP Request in _start / RootContext on_vm_start #137

Open acarlson0000 opened 2 years ago

acarlson0000 commented 2 years ago

Hello - hope its okay to put a query through, but I'm looking for some pointers into how this may be possible. I'm an envoy/rust novice, so apologies if I have an incorrect grasp of this stuff!

For context; we're trying to write a WASM filter to perform CRL verification against an incoming Client Certificate subject.

Currently, I have the filter working whereby the first initial HTTP request that is incercepted by the filter fetches a list of CRLs, and adds it to shared data (ie working as a singleton). After which, further requests have use of it - however, I realise this isn't ideal.

Ideally, I'd like to fetch this file when the VM starts, and ensure it is added to shared buffer to make use of later, but I'm unsure exactly how to put this together. As I understand it, the RootContext trait won't get access to the dispatch_http_call (or the on_http_call_response / get_http_call_response_body, as these are part of the Context trait. Nor am I able to use an additional request library (ie reqwest etc due to the obvious compilation issues).

Provided a small example filter below.

struct WasmRustFilter {
  context_id: u32,
}

#[no_mangle]
pub fn _start() {
    proxy_wasm::set_log_level(LogLevel::Info);
    proxy_wasm::set_root_context(|context_id| -> Box<dyn RootContext> {
        Box::new(WasmRustFilter { context_id })
    });

    proxy_wasm::set_http_context(|context_id, _| -> Box<dyn HttpContext> {
        Box::new(WasmRustFilter { context_id })
    });

    << EXAMPLE> FETCH FILE DATA HERE, AND STORE IN SHARED DATA >>

}

impl HttpContext for WasmRustFilter {
    fn on_http_request_headers(&mut self, _: usize) -> Action {
      // on request headers
    }

    fn on_http_response_headers(&mut self, _: usize) -> Action {
      // on response headers
    }
}

impl Context for WasmRustFilter {
    fn on_http_call_response(&mut self, _token_id: u32, _num_headers: usize, body_size: usize, _num_trailers: usize) {
        if let Some(body) = self.get_http_call_response_body(0, body_size) {
            if !body.is_empty() {
                match self.set_shared_data(":data", Some(&body), None) {
                    Ok(_) => {
                        proxy_wasm::hostcalls::log(LogLevel::Info, "added shared data");
                        // ... stuff
                    }
                    Err(cause) => panic!("unexpected status: {:?}", cause),
                }
            }
        }
    }
}

impl RootContext for WasmRustFilter {
    fn on_vm_start(&mut self, _: usize) -> bool {
        proxy_wasm::hostcalls::log(LogLevel::Info, "on_vm_start ran");
        self.set_tick_period(Duration::from_secs(5));

        << EXAMPLE > FETCH FILE DATA HERE, AND STORE IN SHARED DATA - as per below >>

        self.dispatch_http_call(
            "cluster",
            vec![
                (":method", "GET"),
                (":path", "requested-file"),
                (":scheme", "https"),
            ],
            None,
            vec![],
            Duration::from_secs(5),
        )
        .unwrap();
        true
    }
}

Any help would be appreciated, thanks! I'm aware there are implications for how to refresh the cache / check expiry etc, but just wondering if this is even possible.

antonengelhardt commented 1 year ago

@acarlson0000 I am doing something similar: Fetching info from Open ID and JWKs endpoints.

This happens during startup and refreshen on a specified interval. I also wondered how this works and the solution is to dispatch calls in the on_tick-Function. This function is called on every tick, which you can initially "turn on" with self.set_tick_period(). I think you should run this setting of the period inside on_vm_start.

Hope it helps and and this is not already resolved :)

GraemeMitchell84 commented 7 months ago

I hope it's okay to extend this question somewhat as I have a related question. I've been creating similar functionality which loads some information from an API on tick to periodically update shared data. The problem I find with using on_tick is that is it triggered multiple times per duration of the tick, I guess due to multiple workers / threads which each has their own tick. Those are triggered at the same time, but I only want to call an API once in the tick duration so I don't spam the API with unnecessary requests. Anybody any idea how this can be achieved?

PiotrSikora commented 7 months ago

You could have separate singleton background service that fetches and updates data in key-value store, and workers could then read shared data from that key-value store.

GraemeMitchell84 commented 7 months ago

Thanks for the idea! Let me research that option and I'll post here an example if I get something working