oxen-io / oxen-core

Oxen core repository, containing oxend and oxen cli wallets
https://oxen.io
Other
317 stars 120 forks source link

Truncate the L2 provider URLs that are logged #1706

Closed Doy-lee closed 2 months ago

Doy-lee commented 2 months ago

Truncate the URL in logs so that user specific RPC nodes aren't logged.

jagerman commented 2 months ago

I'm a bit worried that this ellipsizing of URLs is going to take out too much information. For example, I am using --l2-provider http://10.24.0.1:9547,http://10.25.0.2:9547 and with this log truncation all I'm going to see is http...9547 for both of my providers which isn't enough to identify them.

Perhaps we could do something more URL aware like this that removes HTTP auth and the path but leaves protocol/host/port intact:

constexpr std::array PROTOCOLS = {"http://"sv, "https://"sv, "ws://"sv, "wss://"sv};

std::string trim_url(std::string_view url) {
    std::string result;
    for (const auto& p : PROTOCOLS) {
        if (url.starts_with(p)) {
            url.remove_prefix(p.size());
            result += p;
            break;
        }
    }

    // If the URL ends with /STUFF replace with /...
    bool append_ellipsis = false;
    if (auto slash = url.find('/'); slash < url.size() - 1) {
        append_ellipsis = true;
        url.remove_suffix(url.size() - 1 - slash);
    }
    if (auto at = url.find('@'); at < url.size()) {
        result += "…";
        url.remove_prefix(at);
    }

    result += url;
    if (append_ellipsis)
        result += "…";

    return result;
}

With some test output:

int main() {

    for (const auto& url : {
        "https://10.24.0.1:9547",
        "https://10.25.0.2:9547",
        "http://10.24.0.1:9547",
        "https://10.24.0.1:9547/",
        "https://10.24.0.1:9547/a",
        "https://10.24.0.1:9547/secret-stuff",
        "https://user:pass@10.24.0.1:9547",
        "https://user:pass@10.24.0.1:9547/stuff",
        "ws://user:pass@10.24.0.1:9547/stuff",
    }) {

        std::cout << url << " --> " << trim_url(url) << "\n";
    }
}
https://10.24.0.1:9547 --> https://10.24.0.1:9547
https://10.25.0.2:9547 --> https://10.25.0.2:9547
http://10.24.0.1:9547 --> http://10.24.0.1:9547
10.24.0.1:9547 --> 10.24.0.1:9547
10.25.0.1 --> 10.25.0.1
https://10.25.0.1/abcdef --> https://10.25.0.1/…
http://10.24.0.1:9547 --> http://10.24.0.1:9547
https://10.24.0.1/ --> https://10.24.0.1/
https://10.24.0.1:9547/a --> https://10.24.0.1:9547/…
https://10.24.0.1:9547/secret-stuff --> https://10.24.0.1:9547/…
https://user:pass@10.24.0.1:9547 --> https://…@10.24.0.1:9547
https://user:pass@10.24.0.1/stuff --> https://…@10.24.0.1/…
ws://user:pass@10.24.0.1:9547/stuff --> ws://…@10.24.0.1:9547/…
jagerman commented 2 months ago

Actually it's probably useful to keep the last few characters in a path as well (because, if you're using the same provider but with different auth codes for primary/backup, that might be the only difference in the URLs). Here's v2 that leaves the last 3 path chars:

constexpr std::array PROTOCOLS = {"http://"sv, "https://"sv, "ws://"sv, "wss://"sv};
constexpr size_t PATH_TAIL_SIZE = 3;

std::string trim_url(std::string_view url) {
    std::string result;
    for (const auto& p : PROTOCOLS) {
        if (url.starts_with(p)) {
            url.remove_prefix(p.size());
            result += p;
            break;
        }
    }

    // If the URL ends with /lotsofstuff replace with /…uff
    std::string_view path_tail;
    if (auto slash = url.find('/');
        slash != std::string_view::npos && slash + PATH_TAIL_SIZE + 1 < url.size()) {
        path_tail = url.substr(url.size() - PATH_TAIL_SIZE);
        url.remove_suffix(url.size() - 1 - slash);
    }

    // Strip out a potential HTTP user:pass@ auth
    if (auto at = url.find('@'); at < url.size()) {
        result += "…";
        url.remove_prefix(at);
    }

    result += url;
    if (!path_tail.empty()) {
        result += "…";
        result += path_tail;
    }

    return result;
}
int main() {

    for (const auto& url : {
        "https://10.24.0.1:9547",
        "https://10.25.0.2:9547",
        "http://10.24.0.1:9547",
        "10.24.0.1:9547",
        "10.25.0.1",
        "https://10.25.0.1/abcdef",
        "http://10.24.0.1:9547",
        "https://10.24.0.1/",
        "https://10.24.0.1:9547/a",
        "https://10.24.0.1:9547/ab",
        "https://10.24.0.1:9547/abc",
        "https://10.24.0.1:9547/abcd",
        "https://10.24.0.1:9547/abcde",
        "https://10.24.0.1:9547/secret-stuff",
        "https://user:pass@10.24.0.1:9547",
        "https://user:pass@10.24.0.1/stuff",
        "ws://user:pass@10.24.0.1:9547/stuff",
    }) {

        std::cout << url << " --> " << trim_url(url) << "\n";
    }
}
https://10.24.0.1:9547 --> https://10.24.0.1:9547
https://10.25.0.2:9547 --> https://10.25.0.2:9547
http://10.24.0.1:9547 --> http://10.24.0.1:9547
10.24.0.1:9547 --> 10.24.0.1:9547
10.25.0.1 --> 10.25.0.1
https://10.25.0.1/abcdef --> https://10.25.0.1/…def
http://10.24.0.1:9547 --> http://10.24.0.1:9547
https://10.24.0.1/ --> https://10.24.0.1/
https://10.24.0.1:9547/a --> https://10.24.0.1:9547/a
https://10.24.0.1:9547/ab --> https://10.24.0.1:9547/ab
https://10.24.0.1:9547/abc --> https://10.24.0.1:9547/abc
https://10.24.0.1:9547/abcd --> https://10.24.0.1:9547/…bcd
https://10.24.0.1:9547/abcde --> https://10.24.0.1:9547/…cde
https://10.24.0.1:9547/secret-stuff --> https://10.24.0.1:9547/…uff
https://user:pass@10.24.0.1:9547 --> https://…@10.24.0.1:9547
https://user:pass@10.24.0.1/stuff --> https://…@10.24.0.1/…uff
ws://user:pass@10.24.0.1:9547/stuff --> ws://…@10.24.0.1:9547/…uff
Doy-lee commented 2 months ago

Its these kind of things

    // Strip out a potential HTTP user:pass@ auth
    if (auto at = url.find('@'); at < url.size()) {
        result += "…";
        url.remove_prefix(at);
    }

that dissuaded me from trying to make this more intelligent. Mainly because the URL spec is crazy-town bonkers and not being sure if we'd handle all the cases correctly. Though this is user configured so I mean the impact of that is fairly localised. I'll take the change but that's my 2c.

Doy-lee commented 2 months ago

Has been updated.