rust-lang / libs-team

The home of the library team
Apache License 2.0
123 stars 19 forks source link

Attempt to help users avoid unbuffered small reads #445

Open joshtriplett opened 1 week ago

joshtriplett commented 1 week ago

Users regularly come to the /r/rust subreddit looking for help when their Rust code doesn't perform as well as they expect. The second most common culprit (after debug vs release build) is doing unbuffered small reads from a file.

I'm wondering if there's anything we could do to help users catch or avert this more easily.

the8472 commented 1 week ago

The File docs already mention it. All I can think of is a clippy lint, the list doesn't list anything relevant when I search for "buff".

Don't even have to scroll to see it in the RA hover:

Image

Edit: there already is an open issue https://github.com/rust-lang/rust-clippy/issues/1805

Mark-Simulacrum commented 1 week ago

Perhaps a debug-assertion and/or eprintln that warns if we see repeated reads of under (say) 4kb on a File? If we had some notion of metrics or other "debugging opt-in" I could imagine gating it on that -- similar to e.g. go's support for profiling mutex contention when opting in.

the8472 commented 1 week ago

We'd need either build-std or yet another mechanism to enable debug asserts in std. The existing one is only meant for UB checks. And it'd probbaly be bad for compile times to make all of them toggleable.

joshtriplett commented 1 week ago

@Mark-Simulacrum I was thinking about exactly that: some way to easily support a cargo perf-smoketest or similar that helps people figure out why their Rust performance is a problem.

Honestly, perhaps a script that ptraces their process and looks at read/write syscall sizes would suffice. The same script could also look at whether the binary was built in debug or release mode.

cuviper commented 1 week ago

Another way to help users is to make the "right" thing easier, like:

impl File {
    pub fn open_buffered<P: AsRef<Path>>(path: P) -> Result<BufReader<File>>;
    pub fn create_buffered<P: AsRef<Path>>(path: P) -> Result<BufWriter<File>>;
}

This has an advantage over docs because auto-complete will suggest it.

joshtriplett commented 1 week ago

Another way to help users is to make the "right" thing easier, like:

impl File { pub fn open_buffered<P: AsRef>(path: P) -> Result<BufReader>; pub fn create_buffered<P: AsRef>(path: P) -> Result<BufWriter>; }

This has an advantage over docs because auto-complete will suggest it.

That sounds like a great idea. Want to write an ACP?

programmerjake commented 1 week ago

Perhaps a debug-assertion and/or eprintln that warns if we see repeated reads of under (say) 4kb on a File?

you'd probably also want to check if it's a tty or some other file where you want real time responses to small amounts of data instead of buffering everything up -- though now that I'm thinking about it a bit more, you'd probably just use non-blocking >=4kb reads instead of small reads.

cuviper commented 1 week ago

That sounds like a great idea. Want to write an ACP?

See #446.