Alternative to read-line

ruricolist / serapeum

Utilities beyond Alexandria

MIT License

425 stars 42 forks source link

Alternative to read-line #49

Open ruricolist opened 4 years ago

ruricolist commented 4 years ago

It would be nice to have an alternative to read-line, along the lines of read-line-into.

If it is too clumsy to be practical to be used directly it could still be wrapped into a nice Iterate driver.

phoe commented 2 years ago

What's the difference between read-sequence and read-line-into? Just that the other halts on a newline?

ruricolist commented 2 years ago

I think it's a little more complicated, with an extra value return value when the buffer is too short to hold a full line. And there's also simple-stream-read-line, which uses the buffer except when the line is too long, in which case it allocates a new string.

phoe commented 2 years ago

OK - I've found the documentation of it at https://franz.com/support/documentation/current/doc/operators/excl/read-line-into.htm

How can this be implemented without implementation support in an optimized way/without reading char by char? If we try to read a whole buffer of characters at once, it's possible that we'll overshoot the newline, at which point we will need to unread multiple characters, which is impossible in portable CL when working with arbitrary streams.

ruricolist commented 2 years ago

That's a good question, and why this is an open issue.

On the other hand, reading character by character may not be that inefficient, assuming the implementation buffers reads internally. And there's always the possibility of a Gray stream (although in that case the performance benefits might be lost).

phoe commented 2 years ago

(although in that case the performance benefits might be lost)

In the event where we can both provide a Gray stream that's capable of buffering reads and unreading multiple chars at once and having a specialized read-line-into method created for it, we can probably regain all the performance. Maybe that would be the way forward if we accept the compromise of other streams (and especially other Gray streams) being slow.

But, yes, this sounds like a question for implementations themselves - is there any API for their default streams to unread whole chunks of data, which would make an efficient read-line-into possible.

ruricolist commented 2 years ago

You can always "unread" multiple character using make-concatenated-stream:

(setf stream (make-concatenated-stream (make-string-input-stream extra-chars) original-stream)

phoe commented 2 years ago

The issue is you can't do that outside a function call. If you do (read-line-into string stream) in your code, then the function read-line-into cannot force stream to become a concatenated stream. This, and you usually don't want to be setfing dynamic variable bindings like that, lest you clobber the global standard input or something.