p4lang / p4c

P4_16 reference compiler
https://p4.org/
Apache License 2.0
682 stars 446 forks source link

Expensive stream flushes #4922

Open asl opened 2 months ago

asl commented 2 months ago

We are having the following code in lib/indent.h:

namespace IndentCtl {
inline std::ostream &endl(std::ostream &out) {
    return out << std::endl << indent_t::getindent(out);
}
inline std::ostream &indent(std::ostream &out) {
    ++indent_t::getindent(out);
    return out;
}
inline std::ostream &unindent(std::ostream &out) {
    --indent_t::getindent(out);
    return out;
}

So, doing out << IndentCtl::endl essentially flushes the output stream. This is more or less fine for cout / cerr, however IndentCtl is used in Utils::Json. As a result, serialization of json objects become terrible expensive due to constant flushes of output file stream, e.g.:

void JsonArray::serialize(std::ostream &out) const {
    bool isSmall = true;
    for (auto v : *this) {
        if (!v->is<JsonValue>()) isSmall = false;
    }
    out << "[";
    if (!isSmall) out << IndentCtl::indent;
    bool first = true;
    for (auto v : *this) {
        if (!first) {
            out << ",";
            if (isSmall) out << " ";
        }
        if (!isSmall) out << IndentCtl::endl;
        if (v == nullptr)
            out << "null";
        else
            v->serialize(out);
        first = false;
    }
    if (!isSmall) out << IndentCtl::unindent << IndentCtl::endl;
    out << "]";
}

I'm seeing possible ways of fixing this:

Opinions? Tagging @ChrisDodd @grg @fruffy @vlstill

fruffy commented 2 months ago

I have no problems with switching out endl, considering that this seem to be the recommended way: https://clang.llvm.org/extra/clang-tidy/checks/performance/avoid-endl.html

The only question is whether we can guarantee that we do end up flushing correctly?

asl commented 2 months ago

The only question is whether we can guarantee that we do end up flushing correctly?

Streams force flush on closure, so this should not be a problem. The only potential outcome is that we might see incomplete writes in case of e.g. crashes.

ChrisDodd commented 2 months ago

I'm seeing possible ways of fixing this:

  • Make IndentCtl::endl use \n instead of endl
  • Switch Util::JSon not to use endl (though we'd need to expose current indent level then)

Opinions? Tagging @ChrisDodd @grg @fruffy @vlstill

Since endl is supposed to flush (by analogy with std::endl, making it not flush would be problematic. Perhaps add IndentCtl::nl to output a newline without flushing?

One of my thought for IndentCtl was to redo it by makeing it a streambuf and intercepting the overflow and inserting indentation there. That way it could do it automatically (and transparently) based on what was output, and could also wrap long lines properly. But that is a much bigger challenge

asl commented 2 months ago

Since endl is supposed to flush (by analogy with std::endl, making it not flush would be problematic. Perhaps add IndentCtl::nl to output a newline without flushing?

Glad you suggested this :) As this is exactly my existing solution downstream.

One of my thought for IndentCtl was to redo it by makeing it a streambuf and intercepting the overflow and inserting indentation there. That way it could do it automatically (and transparently) based on what was output, and could also wrap long lines properly. But that is a much bigger challenge

This does not seem it would worth the complexity, right.

asl commented 1 month ago

And it turned out to be a bit tricky. The reason is that GC does not execute destructors for objects. As a result, the streams are never closed. And therefore what left in buffers is left there w/o explicit flush...