rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.63k stars 12.49k forks source link

Add back io::copy file to pipe optimization #128300

Open NobodyXu opened 1 month ago

NobodyXu commented 1 month ago

So to sum up, I think we can add back optimization via:

Otherwise, we'd fallback to:

Also, if the file is a named fifo, then just a splice without F_MOVE would work fine.

Originally posted by @NobodyXu in https://github.com/rust-lang/rust/issues/108283#issuecomment-2254439139

the8472 commented 1 month ago

splice to an intermediate pipe with F_MOVE and then splice to the pipe without F_MOVE if

We never used F_MOVE and it was still a problem. So I'm skeptical it'll work at all but maybe it depends on the direction (from/to pipe).

then it's immutable (detectable via statfs detectable via fcntl(fd, F_GET_SEALS)

Those cost extra syscalls, which could be unnecessary overhead for small files. So it probably only makes sense to even try these things for large files.

And many of those cases seem rare that it's not worth it. Like who serves e.g. a webserver from a btrfs snapshot?

NobodyXu commented 1 month ago

Yeah I think we'd only want tohandle sealed file here, I was probably listing too many cases during my brain storming 😂

However, I don't think sealed file is used very often.

We never used F_MOVE and it was still a problem. So I'm skeptical it'll work at all but maybe it depends on the direction (from/to pipe).

Hmm it seems that SPLICE_F_MOVE is a no-op according to man page, so I think using an intermediate pipe to workaround this will not work.

So the only possible workarond for a really large file and fs supporting reflink would be to use copy_file_range to a O_TMPFILE and then splice.

So it probably only makes sense to even try these things for large files.

Agree, if it is small enough, read + write is OK.

NobodyXu commented 1 month ago

I think we might want someway for the user to tell std::io::copy that they know that the file won't change before they sent out the data, the application itself could make some reasonable assumption about this.

Checking for sealed file, checking for read-only fs, etc is all about niche cases, I think we need application to convey some information about this?

the8472 commented 1 month ago

Currently, file seals can be applied only to a file descriptor returned by memfd_create(2)

Also doesn't seem like a common thing.

I think we need application to convey some information about this?

Yeah. The most common thing where sendfile makes sense is a webserver. And the files might not even be marked read-only there.

NobodyXu commented 1 month ago

Currently, file seals can be applied only to a file descriptor returned by memfd_create(2)

Also doesn't seem like a common thing.

Thanks I missed that one, if so then file seals seems very uncommon

NobodyXu commented 1 month ago

And the files might not even be marked read-only there.

Agree, the more I look at it, the more I think we need another API like rust-lang/libs-team#202 propose, which will always assume using splice is ok:

pub struct ImmutableFile(File);

That's the simplest way I can think of, no need for another copy function as we just want to promise that the file is indeed immutable.

the8472 commented 1 month ago

I think we don't want to promise anything that relies on specialization until there's a plan to stabilize specialization or at least parts thereof.

NobodyXu commented 1 month ago

I think we don't want to promise anything that relies on specialization until there's a plan to stabilize specialization or at least parts thereof.

That makes sense, in that case maybe a dedicated API on File?

impl File {
    /// This is similar to [`std::io::copy`], except it will attempt to use `splice` on Linux or `sendfile` on Unix
    /// to avoid copying data if possible.
    ///
    /// NOTE that if you change the content of the file after this call, it might also change the data in `writer`.
    pub fn copy(&self, writer: &mut W) -> io::Result<u64>
    where
        W: Write + ?Sized;
}
the8472 commented 1 month ago

That would still require specialization for the writer side. The only way to do this with stable rust is to require both sides to implement all the necessary traits which expose the internal buffers and so on. What CopyRead and CopyWrite currently do. And that's some gnarly machinery that might not be the right fit for std, considering that only useful on some platforms. Nobody has even generalized the current optimizations to other platforms.

NobodyXu commented 1 month ago

That would still require specialization for the writer side.

Yeah but isn't that same as std::io::copy where we specialise internally and not exposing it?

the8472 commented 1 month ago

We don't guarantee that that happens though, so we can remove it if specialization becomes an issue.

NobodyXu commented 1 month ago

We don't guarantee that that happens though, so we can remove it if specialization becomes an issue.

So...maybe we just add it, saying that we assume the File is immutable/append-only for the sake of possible optimization in File::copy, and leave out all the details?

If specialisation is removed, it'd be just a convenient method of std::io::copy

the8472 commented 1 month ago

Except that we'd have to make that distinction very clear to avoid people getting corrupt data. And adding warnings and choosing a different name for something we can't promise probably isn't worth it. Now that I have written that I should probably close that ACP.... wait no, that one doesn't strictly require specialization...