microsoft / STL

MSVC's implementation of the C++ Standard Library.
Other
10.05k stars 1.48k forks source link

<iostream>: std::cout with very low speed compared with printf #4853

Closed jiannanya closed 1 day ago

jiannanya commented 1 month ago

Describe the bug

The default std::cout does much slower output to default console than printf,even use sync_with_stdio(false) .

Command-line test case

C:\Temp>type testcout.cc
#include <chrono>
#include <iostream>

using std::cin;
using std::cout;
using namespace std::chrono;

constexpr int bench_times{20000};
constexpr float pi{3.1415926f};

int main(int argc, char** argv) {
  cout.sync_with_stdio(false);
  cout.tie(nullptr);
  cin.tie(nullptr);
  auto tm = time(nullptr);

  auto start = high_resolution_clock::now();
  for (int i{}; i != bench_times; ++i) {
    // printf("integer: %d, fraction: %lf, tp:%lld\n", i, pi, tm);
    cout << "integer: " << i << ",fraction: " << pi << " tp: " << tm << '\n';
  }
  auto end = high_resolution_clock::now();
  auto durat = duration_cast<milliseconds>(end - start).count();

  cout << "cost: " << durat << "ms";
  return 0;
}

C:\Temp>cl.exe /O2 /GL /Gy /MD /EHsc /utf-8 /std:c++17 /Fe: .\testcout.exe .\testcout.cc

C:\Temp>.\testcout.exe

Expected behavior

The speed of std::cout is similar to the printf's.

STL version

Microsoft Visual Studio Community 2019
Version 16.11.30

My Local test result

cpu: 12th Gen Intel(R) Core(TM) i5-12600KF   3.70 GHz
system: Windows10 19044.4529
cout: cost: 1012ms   // printf
cost: 7916ms // cout
frederick-vs-ja commented 1 month ago

Perhaps related to #3669.

fsb4000 commented 1 month ago

Yeah, I think it's a duplicate.

@jiannanya Consider to use buffering output, setvbuf(stdout, nullptr, _IOLBF, 16384);

StephanTLavavej commented 1 month ago

This indeed seems related but we're not quite willing to resolve this as a duplicate yet, as this issue is talking about the absolute performance difference (which could have multiple causes), and sync_with_stdio is just one cause.

heckerpowered commented 3 days ago

I think it might have something to do with operator<<.

operator<< is a user-defined operator here, so it’s actually a function, every operator<< call locks the buffer, but the printf locks the buffer only once (I dunno how it actually works, but I don’t think it has to be locked multi times)

I’ve tested the example above, if we just call operator<< once, the speed of count is similar to printf

StephanTLavavej commented 1 day ago

Thanks, that sounds correct to us. We're going to close this as half by design (repeatedly calling operator<< is of course more expensive) and half duplicate of #3669.