supitsdu / clipper

Seamlessly copy file contents to clipboard from command line. Lightweight, cross-platform tool for instant text transfers.
MIT License
3 stars 3 forks source link

Improve Performance with Buffered I/O (bufio) for Large File Handling #22

Closed supitsdu closed 2 months ago

supitsdu commented 2 months ago

Currently, clipper reads and writes file content directly using standard I/O operations. When dealing with large files, this can lead to suboptimal performance due to the overhead of frequent system calls.

We could introduce buffered I/O (e.g.: bufio) for file operations. This involves using a buffer to read and write data in larger chunks, reducing the number of system calls and potentially significantly improving performance, especially when working with large files.

Tasks:

  1. Replace direct file reads (os.ReadFile) and writes (io.ReadAll) with buffered I/O operations. e.g.: using bufio.NewReader and bufio.NewWriter.
  2. Benchmark the performance improvement when copying content from large files to the clipboard.
  3. Update documentation to reflect the use of buffered I/O.
supitsdu commented 2 months ago

Performance Evaluation of Buffered I/O Implementation

Summary

I evaluated the performance impact of using buffered I/O (bufio.Reader) for reading file content in Clipper, addressing issue #22. Our goal was to enhance performance, particularly with large files. However, benchmarks revealed that the non-buffered implementation outperforms the buffered one.

Benchmark Results

With Buffered I/O:

Without Buffered I/O:

Observations

  1. Huge Files (500 MB):

    • Buffered I/O: 995280525 ns/op
    • Non-Buffered I/O: 270109218 ns/op
    • Observation: Non-buffered I/O is ~3.7 times faster.
  2. Large Files (50 MB):

    • Buffered I/O: 112469869 ns/op
    • Non-Buffered I/O: 24682830 ns/op
    • Observation: Non-buffered I/O is ~4.6 times faster.
  3. Medium Files (1 MB):

    • Buffered I/O: 3473087 ns/op
    • Non-Buffered I/O: 566447 ns/op
    • Observation: Non-buffered I/O is ~6.1 times faster.
  4. Small Files (24 KB):

    • Buffered I/O: 63227 ns/op
    • Non-Buffered I/O: 20388 ns/op
    • Observation: Non-buffered I/O is ~3.1 times faster.

Conclusion

The anticipated performance improvements from using buffered I/O were not realized. The non-buffered implementation consistently outperforms the buffered one across all tested file sizes.

Decision

We will close PR #33 and continue using the non-buffered I/O implementation. This decision is based on the benchmark results showing superior performance of the non-buffered approach.

If future developments suggest a more efficient use of buffered I/O, we will re-evaluate this decision. All findings and benchmark results will be documented for future reference.

Benchmark Code

// BenchmarkFileContentReader benchmarks the Read method of FileContentReader for different file sizes.
func BenchmarkFileContentReader(b *testing.B) {
    sizes := map[string]int{
        "Small":  24 * 1024,
        "Medium": 1 * 1024 * 1024,
        "Large":  50 * 1024 * 1024,
        "Huge":   500 * 1024 * 1024,
    }

    for sizeName, size := range sizes {
        b.Run(sizeName, func(b *testing.B) {
            largeInput := randomstring.String(size)
            tempFile, err := createBenchmarkTempFile(b, largeInput)
            if err != nil {
                b.Fatalf("Failed to create temp file: %v", err)
            }

            reader := clipper.FileContentReader{FilePath: tempFile.Name()}

            b.ResetTimer()
            for i := 0; i < b.N; i++ {
                _, err := reader.Read()
                if err != nil {
                    b.Fatalf("FileContentReader.Read failed: %v", err)
                }
            }
        })
    }
}

// createTempFile creates a temporary file for testing purposes and writes the given content to it.
func createBenchmarkTempFile(b *testing.B, content string) (*os.File, error) {
    b.Helper()
    file, err := os.CreateTemp(b.TempDir(), "testfile")
    if err != nil {
        return nil, err
    }

    _, err = file.WriteString(content)
    if err != nil {
        return nil, err
    }

    return file, nil
}