p4lang / p4c

P4_16 reference compiler
https://p4.org/
Apache License 2.0
679 stars 444 forks source link

Expensive stream flushed #4922

Open asl opened 1 month ago

asl commented 1 month ago

We are having the following code in lib/indent.h:

namespace IndentCtl {
inline std::ostream &endl(std::ostream &out) {
    return out << std::endl << indent_t::getindent(out);
}
inline std::ostream &indent(std::ostream &out) {
    ++indent_t::getindent(out);
    return out;
}
inline std::ostream &unindent(std::ostream &out) {
    --indent_t::getindent(out);
    return out;
}

So, doing out << IndentCtl::endl essentially flushes the output stream. This is more or less fine for cout / cerr, however IndentCtl is used in Utils::Json. As a result, serialization of json objects become terrible expensive due to constant flushes of output file stream, e.g.:

void JsonArray::serialize(std::ostream &out) const {
    bool isSmall = true;
    for (auto v : *this) {
        if (!v->is<JsonValue>()) isSmall = false;
    }
    out << "[";
    if (!isSmall) out << IndentCtl::indent;
    bool first = true;
    for (auto v : *this) {
        if (!first) {
            out << ",";
            if (isSmall) out << " ";
        }
        if (!isSmall) out << IndentCtl::endl;
        if (v == nullptr)
            out << "null";
        else
            v->serialize(out);
        first = false;
    }
    if (!isSmall) out << IndentCtl::unindent << IndentCtl::endl;
    out << "]";
}

I'm seeing possible ways of fixing this:

Opinions? Tagging @ChrisDodd @grg @fruffy @vlstill

fruffy commented 1 month ago

I have no problems with switching out endl, considering that this seem to be the recommended way: https://clang.llvm.org/extra/clang-tidy/checks/performance/avoid-endl.html

The only question is whether we can guarantee that we do end up flushing correctly?

asl commented 1 month ago

The only question is whether we can guarantee that we do end up flushing correctly?

Streams force flush on closure, so this should not be a problem. The only potential outcome is that we might see incomplete writes in case of e.g. crashes.

ChrisDodd commented 1 month ago

I'm seeing possible ways of fixing this:

  • Make IndentCtl::endl use \n instead of endl
  • Switch Util::JSon not to use endl (though we'd need to expose current indent level then)

Opinions? Tagging @ChrisDodd @grg @fruffy @vlstill

Since endl is supposed to flush (by analogy with std::endl, making it not flush would be problematic. Perhaps add IndentCtl::nl to output a newline without flushing?

One of my thought for IndentCtl was to redo it by makeing it a streambuf and intercepting the overflow and inserting indentation there. That way it could do it automatically (and transparently) based on what was output, and could also wrap long lines properly. But that is a much bigger challenge

asl commented 1 month ago

Since endl is supposed to flush (by analogy with std::endl, making it not flush would be problematic. Perhaps add IndentCtl::nl to output a newline without flushing?

Glad you suggested this :) As this is exactly my existing solution downstream.

One of my thought for IndentCtl was to redo it by makeing it a streambuf and intercepting the overflow and inserting indentation there. That way it could do it automatically (and transparently) based on what was output, and could also wrap long lines properly. But that is a much bigger challenge

This does not seem it would worth the complexity, right.

asl commented 3 weeks ago

And it turned out to be a bit tricky. The reason is that GC does not execute destructors for objects. As a result, the streams are never closed. And therefore what left in buffers is left there w/o explicit flush...