How can I use the linter to read from stdin instead of the file?

yoheimuta / protolint

A pluggable linter and fixer to enforce Protocol Buffer style and conventions.

MIT License

576 stars 52 forks source link

How can I use the linter to read from stdin instead of the file? #327

Open AlexCannonball opened 1 year ago

AlexCannonball commented 1 year ago

Hello,

I'm trying to run the linter and give it the text for linting via the shell stdin instead of passing the filename with the text.

Unfortunately I haven't find the way to do it on Windows. For example:

> 'foo' | .\protolint.exe lint
protolint lint requires at least one argument. See Usage.

Does the linter support reading from stdin instead of the file mode? If yes, could you show me the proper command + args example?

Thank you.

yoheimuta commented 1 year ago

Hi @AlexCannonball

No, unfortunately the linter does not support reading from stdin at the moment.

Thanks.

AlexCannonball commented 1 year ago

Hello @yoheimuta

Thank you for the reply. The reason I'm asking is because I'm trying to improve UX in VS Code extension (https://github.com/plexsystems/vscode-protolint).

At the moment the extension only shows up-to-date linting warnings when you open or save the protobuf document. If the document has some unsaved changes, obviously they are not yet in the file we pass to protolint.

It doesn't seem to be a big deal for Unix/macOS, I guess even something like this may work:

protolint lint /dev/stdin

However, Windows doesn't seem to provide similar option: https://stackoverflow.com/questions/7395157/windows-equivalent-to-dev-stdin

At the moment I don't see any cross-platform workaround. Even if I success to pass the "document" via Unix domain socket and Windows named pipe, it still needs to consider the platform when generating the socket/pipe name. And it still looks like an overengineering.

Unlike that reading the document text from stdin seems to be easier to implement in the extension. Maybe you consider adding this feature to the linter.

If you have any thoughts, let me know.

yoheimuta commented 1 year ago

@AlexCannonball Thank you for providing more details about your use case - it makes perfect sense now. Your explanation actually reminded me that I implemented a workaround to create a temporary file and give it to protolint when I was working on intellij-protolint.

To support this feature with minimal effort, I think protolint should create a temporary file in the func doLint before it starts processing:

checking if there's any input from stdin
if so, create a temporary file, write the content to it
append the temporary file path to the args slice

# This is a rough sketch:
diff --git a/internal/cmd/cmd.go b/internal/cmd/cmd.go
index 6e172b2..5048ccc 100644
--- a/internal/cmd/cmd.go
+++ b/internal/cmd/cmd.go
@@ -3,6 +3,8 @@ package cmd
 import (
    "fmt"
    "io"
+   "io/ioutil"
+   "os"
    "strings"

    "github.com/yoheimuta/protolint/internal/cmd/subcmds/lint"
@@ -73,9 +75,39 @@ func doSub(

 func doLint(
    args []string,
+   stdin io.Reader,
    stdout io.Writer,
    stderr io.Writer,
 ) osutil.ExitCode {
+   // Read from stdin
+   input, err := ioutil.ReadAll(stdin)
+   if err != nil {
+       _, _ = fmt.Fprintln(stderr, "Error reading from stdin:", err)
+       return osutil.ExitInternalFailure
+   }
+
+   // If there's input, create a temporary file and write the content to it
+   if len(input) > 0 {
+       tmpfile, err := ioutil.TempFile("", "protolint-stdin-*.proto")
+       if err != nil {
+           _, _ = fmt.Fprintln(stderr, "Error creating temporary file:", err)
+           return osutil.ExitInternalFailure
+       }
+       defer os.Remove(tmpfile.Name()) // Clean up
+
+       if _, err := tmpfile.Write(input); err != nil {
+           _, _ = fmt.Fprintln(stderr, "Error writing to temporary file:", err)
+           return osutil.ExitInternalFailure
+       }
+       if err := tmpfile.Close(); err != nil {
+           _, _ = fmt.Fprintln(stderr, "Error closing temporary file:", err)
+           return osutil.ExitInternalFailure
+       }
+
+       // Append temporary file path to args
+       args = append(args, tmpfile.Name())
+   }
+
    if len(args) < 1 {
        _, _ = fmt.Fprintln(stderr, "protolint lint requires at least one argument. See Usage.")
        _, _ = fmt.Fprint(stderr, help)

What do you think? Would this approach work for you?

AlexCannonball commented 1 year ago

@yoheimuta thank you for sharing the sketch. I don't know the project and Golang well, so let me comment from protolint API client prospect.

"File mode" is awesome for some tasks, e.g. CI/CD or as protoc plugin. When your files are ready and done.
When you work with protobuf in IDE, ideally you'd like to lint the text on any change event. And there are tons of these events. protolint seems to be fast enough, as Golang is fast. However, the "file mode" API seems to become a bottleneck, as you need to manage files before linting.

As I understand from the sketch, it creates a real file on the disk. ioutil.TempFile seems to be deprecated, BTW: https://pkg.go.dev/io/ioutil#TempFile

My thoughts about disk files:

Disk I/O operations are slower than working with RAM. Maybe it would be too slow for real-time linting.
Intensive protobuf text changes in IDE causes a lot of disk operations, i.e. faster disk device degradation 😞
I guess, the linter still needs to load the whole text into the memory. Protobuf files are usually relatively small. Reading a text chunk-by-chunk to lint chunk-by-chunk doesn't make sense, as you need to see the whole document to detect all errors.
Temp file cleanup and possible filename conflicts - additional points of failure.
- ioutil.TempFile seems to prevent filename clashes with randomization, however, we start depending on this function.
Creating a temp file looks redundant, just a workaround to fit "file mode" API.

What do you think about these?

Let me describe how I see an ideal approach (if we change protolint API):

No changes in "File mode". It's very good for its tasks.
Add a new command, like protolint lintstdin [args] for IDE real-time linting.
The new command tells the linter to read directly from stdin.

I was trying to find where loading from file to memory happens, and ended up in go-protoparser: https://github.com/yoheimuta/go-protoparser/blob/c45546ae3e434eb92eb482faa3873e733c30af8d/lexer/scanner/scanner.go#L46

But I don't understand how many functions need to be added or changed for the "ideal" approach. Maybe just adding something like reader, err := bufio.NewReader(os.Stdin) when lintstdin command is set here? https://github.com/yoheimuta/protolint/blob/d19308920340bcaf80064c65d24a818d12b3fb71/internal/linter/file/protoFile.go#L37

AlexCannonball commented 1 year ago

@yoheimuta hello, a quick update on my research regarding passing the text via named pipes. It works on Windows like this: PS D:\protolint> .\protolint.exe lint \\?\pipe\mypipe.proto

However, it works with problems:

It's significantly slower than linting a usual file, takes seconds to see the linting results.
In the extension I run the server to fill the pipe with the text. When debugging nodejs I always see 22 consequent connections to the server to read \\?\pipe\mypipe.proto for 1 linting task. The first connection always fails with write EPIPE error after I write the text to the socket. Maybe it's an OS call to make sure that the pipe is ready 🤷 However, after that I always get 21 connections regardless the proto-file size. I wonder, maybe it's protoparser scanner/reader algo makes 21 consequent file passages which turns to 21 different connections to my nodejs server providing the text to the pipe?

AlexCannonball commented 1 year ago

Hello @yoheimuta Do you need any details regarding this issue? I guess if protolint really reads the file 22 times for 1 linting command, it may cause heisenbugs when the file changes between these reading operations. Should I create a separate issue for this? I guess reading the file 1 time and caching its contents in protolint memory could fix it.

As for linting from stdin, I think it's not necessary for VS Code extension, if the slow reading from the pipe is fixed. However, even if you decide to implement linting from stdin, the first read operation will empty stdin buffer, so the subsequent reading attempts may fail.

yoheimuta commented 1 year ago

Hi @AlexCannonball

I guess if protolint really reads the file 22 times for 1 linting command

Yes, 375f265 intentionally has changed to read the most recent content, even if the fixer modifies the file along the way. This decision was based on my assumption that local access to a small file is fast enough to justify the extra read.

My thoughts about disk files:

While I understand your concern about potential performance issues with excessive disk access, I have not seen or heard of any problems with intellij-protolint, which creates a temporary file each time a source file is updated.

Just did some experimenting on my MacBook Pro and wanted to share my results with you!

❯ go run main.go -files=1000 -chars=1000
Time taken to create 1000 temp files with 10000 random characters each: 220.808876ms

❯ go run main.go -files=1000 -chars=100
Time taken to create 1000 temp files with 10000 random characters each: 308.157407ms

❯ go run main.go -files=1000 -chars=10000
Time taken to create 1000 temp files with 10000 random characters each: 356.361094ms

❯ go run main.go -files=1000 -chars=100000
Time taken to create 1000 temp files with 10000 random characters each: 1.173995759s

❯ go run main.go -files=10 -chars=1000000
Time taken to create 1000 temp files with 10000 random characters each: 87.575473ms

main.go

```go package main import ( "crypto/rand" "flag" "fmt" "os" "time" ) func main() { var numFiles int var numChars int flag.IntVar(&numFiles, "files", 1000, "Number of temporary files to create") flag.IntVar(&numChars, "chars", 1000, "Number of random characters in each file") flag.Parse() start := time.Now() for i := 0; i < numFiles; i++ { content := generateRandomContent(numChars) err := createTempFile(content) if err != nil { fmt.Printf("Error creating temp file: %v\n", err) os.Exit(1) } } elapsed := time.Since(start) fmt.Printf("Time taken to create 1000 temp files with 10000 random characters each: %v\n", elapsed) } func generateRandomContent(length int) []byte { content := make([]byte, length) _, err := rand.Read(content) if err != nil { fmt.Printf("Error generating random content: %v\n", err) os.Exit(1) } return content } func createTempFile(content []byte) error { tempFile, err := os.CreateTemp("", "tempfile_") if err != nil { return err } defer tempFile.Close() _, err = tempFile.Write(content) if err != nil { return err } return nil } ```

But I don't understand how many functions need to be added or changed for the "ideal" approach.

Implementing this feature would require a lot of code to write and maintain. If your idea relies on this feature, I recommend trying the workaround I suggested first, rather than spending extra effort on implementing the ideal approach. This is especially true if your concern isn't certain

AlexCannonball commented 1 year ago

Hello @yoheimuta and thank you for the reply.

Implementing this feature would require a lot of code to write and maintain

As I understand, this estimation is related to the feature with reading from stdin + making no temporary disk files. This totally makes sense, for VS Code extension I see at least one workaround which doesn't need stdin at all.

Yes, https://github.com/yoheimuta/protolint/commit/375f265022b806640807dcbb9fe32a8f089e5ad2 intentionally has changed to read the most recent content, even if the fixer modifies the file along the way

Is adding a memory cache there also hard to maintain? By the memory cache I mean something like this:

Read the file A from disk to protolint memory once.
Make all necessary read and write operations with the text in memory.
Save the fixed text into A on disk.

If this change is acceptable, it removes extra reads from the target file URI. This hopefully fixes the slow reading the text via Nodejs IPC which I'm trying to use as a workaround for the VS Code Extension.

yoheimuta commented 1 year ago

@AlexCannonball Thank you for sharing your thought.

Is adding a memory cache there also hard to maintain?

Unfortunately, I'm afraid that's the case.