gwsw / less

Less - text pager
http://greenwoodsoftware.com/less
Other
561 stars 88 forks source link

follow mode not working #291

Open cipitaua opened 2 years ago

cipitaua commented 2 years ago

I'm on debian-sid, and I redirect the output of a code into a file. When I less this file, and then I press "shift+f" to follow the content, after reaching the end of the file it does not keep following it when new data is appended (it behaves the same as "shift+g"). I don't encounter this problem on another system (an old centos7).

As side note, the piping into less seems to work, e.g. the following command

for i in {0..10000}; do echo "$i"; sleep 0.1; done | less +F

works as expected. But passing through a file does not.

gwsw commented 2 years ago

I can't reproduce this with either less-590 or less-608. What version are you using? Also to clarify, are you saying that when you press the F key, less jumps to end of file but does not display the "Waiting for data" message?

cipitaua commented 2 years ago

I can't reproduce this with either less-590 or less-608. What version are you using?

it's less-590

Also to clarify, are you saying that when you press the F key, less jumps to end of file but does not display the "Waiting for data" message?

yes it does not display the message

I have three debian machines, almost identical setup, and two of them have this problem.

cipitaua commented 2 years ago

as a test, I run the following command in one shell:

for i in {0..10000}; do echo "$i"; sleep 0.1; done > /dev/shm/tmp

and in another shell

less /dev/shm/tmp

and then I press "shift+f"

gwsw commented 2 years ago

This test works fine for me with less-590. Can you try building less-590 or less-608 from the released source on http://greenwoodsoftware.com/less/download.html? I want to confirm whether this is broken by something Debian changed.

cipitaua commented 2 years ago

Note that in my work desktop (debian sid) it works fine, but in my home desktop and on a server (still both debian) it does not. I think if it was something wrong in the repositories it would not work in all the three cases.

However, I've compiled from source:

compiled less-590 behaves the same as the one from debian repositories

compiled less-608 does actually show the "Waiting for data" message, but nothing is updated

compiled less-530 also shows the "Waiting for data" message, but nothing is updated

I've also noticed that furthermore the 'r' or 'R' keys don't work. I have to exit less and relaunch it in order to see the updated content. Or, curiously, by pressing 'h' (help) and then 'q' (to exit the help screen), the content gets updated "at the bottom", so that I can scroll down to the newest lines.

Please also note that tail -f works fine, always.

cipitaua commented 2 years ago

maybe I could test with a statically-compiled binary, if you have one

cipitaua commented 2 years ago

please note that I have updated my previous post. I'm available to debug if you have any idea. Thank you

gwsw commented 2 years ago

Well this is pretty strange. Would you be able to use gdb to do some debugging?

cipitaua commented 2 years ago

yes sure, what do you suggest?

gwsw commented 2 years ago

The first thing I'd try is set a breakpoint on forw_loop() and make sure you get there when you press F. Then step through it and see whether it's getting into the while loop, calling make_display (which should display the "Waiting for data" message) and calling forward (which should read new data from the file). You could try this on one of your systems that works and compare what happens to what happens on a system that fails.

When using gdb on less, it's best to recompile setting CFLAGS=-g; that is, remove the -O2 from the compile options. gdb sometimes gets confused when debugging a program optimized with -O2.

cipitaua commented 2 years ago

The first thing I'd try is set a breakpoint on forw_loop() and make sure you get there when you press F.

yes it does:

Breakpoint 1, forw_loop (until_hilite=0) at command.c:1178
1178            if (ch_getflags() & CH_HELPFILE)

Then step through it and see whether it's getting into the while loop, calling make_display (which should display the "Waiting for data" message)

yes it displays the message

and calling forward (which should read new data from the file). You could try this on one of your systems that works and compare what happens to what happens on a system that fails.

it seems so:

Breakpoint 1, forward (n=1, force=0, only_last=0) at forwback.c:485
485             if (get_quit_at_eof() && eof_displayed() && !(ch_getflags() & CH_HELPFILE))

When using gdb on less, it's best to recompile setting CFLAGS=-g; that is, remove the -O2 from the compile options. gdb sometimes gets confused when debugging a program optimized with -O2.

yes indeed

gwsw commented 2 years ago

Ok, thanks. So it calls forward(). That should eventually pause waiting for new data to be readable from the file. The stack trace looks like this

#0  iread (fd=4, buf=0x450295 "", len=8183) at ../os.c:122
#1  0x0000000000405091 in ch_get () at ../ch.c:273
#2  0x00000000004059f6 in ch_forw_get () at ../ch.c:641
#3  0x0000000000410d63 in forw_line_seg (curr_pos=9, skipeol=1, rscroll=1, nochop=0) at ../input.c:138
#4  0x0000000000411015 in forw_line (curr_pos=9) at ../input.c:266
#5  0x000000000041011e in forw (n=0, pos=9, force=0, only_last=0, nblank=0) at ../forwback.c:313
#6  0x00000000004104ba in forward (n=1, force=0, only_last=0) at ../forwback.c:522
#7  0x000000000040b0ec in forw_loop (until_hilite=0) at ../command.c:1194
#8  0x000000000040b5ba in commands () at ../command.c:1475
#9  0x00000000004029aa in main (argc=-1, argv=0x7fffffffdb48) at ../main.c:305

So in your case one of those functions must be returning before calling the next one. See if you can find where that's happening.

cipitaua commented 2 years ago

ok, here's the backtrace:

Waiting for data... (interrupt to abort)
Program received signal SIGINT, Interrupt.
0x00007ffff7e452c6 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) backtrace
#0  0x00007ffff7e452c6 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7e49d83 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x0000555555570791 in sleep_ms (ms=2) at os.c:431
#3  0x000055555555c5b5 in ch_get () at ch.c:318
#4  0x000055555555ce70 in ch_forw_get () at ch.c:638
#5  0x00005555555688ca in forw_line_seg (curr_pos=3210, skipeol=1, rscroll=1, nochop=0) at input.c:138
#6  0x0000555555568b84 in forw_line (curr_pos=3210) at input.c:266
#7  0x0000555555567b69 in forw (n=0, pos=3210, force=0, only_last=0, nblank=0) at forwback.c:313
#8  0x0000555555567f11 in forward (n=1, force=0, only_last=0) at forwback.c:522
#9  0x000055555556271a in forw_loop (until_hilite=0) at command.c:1194
#10 0x0000555555562c09 in commands () at command.c:1475
#11 0x0000555555559a04 in main (argc=-1, argv=0x7fffffffdcf8) at main.c:305
(gdb)
gwsw commented 2 years ago

Hm, I'm a bit confused now. This stack trace is showing that less is indeed paused waiting for new data from the file. But you're saying that when new data is added to the file, less does not display it and remains showing the "Waiting for data" message? If that's the case, perhaps there's a problem with poll() on your system. A quick test would be to manually edit defines.h, remove the line

#define HAVE_POLL 1

and rebuild. Let me know if that changes the behavior.

cipitaua commented 2 years ago

I've tried but unfortunately the behavior is the same

cipitaua commented 2 years ago

here's the backtrace with #define HAVE_POLL 1 commented out:

Waiting for data... (interrupt to abort)
Program received signal SIGINT, Interrupt.
0x00007ffff7e452c6 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) backtrace
#0  0x00007ffff7e452c6 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7e49d83 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x000055555557066b in sleep_ms (ms=2) at os.c:431
#3  0x000055555555c5a5 in ch_get () at ch.c:318
#4  0x000055555555ce60 in ch_forw_get () at ch.c:638
#5  0x00005555555688ba in forw_line_seg (curr_pos=374, skipeol=1, rscroll=1, nochop=0) at input.c:138
#6  0x0000555555568b74 in forw_line (curr_pos=374) at input.c:266
#7  0x0000555555567b59 in forw (n=0, pos=374, force=0, only_last=0, nblank=0) at forwback.c:313
#8  0x0000555555567f01 in forward (n=1, force=0, only_last=0) at forwback.c:522
#9  0x000055555556270a in forw_loop (until_hilite=0) at command.c:1194
#10 0x0000555555562bf9 in commands () at command.c:1475
#11 0x00005555555599f4 in main (argc=-1, argv=0x7fffffffdcf8) at main.c:305
(gdb) 

it seems identical to the previous one

gwsw commented 2 years ago

Well this all looks pretty normal. Can you try this -- put a breakpoint on the call to read() in iread line 195 and see if it gets there.

EDIT: To expand a little, read() should be called repeatedly. When new data appears in the file, that call to read should return a nonzero value indicating that data has been read. Since you're not seeing new data, either it's not reaching that read call, or the read is not returning nonzero.

cipitaua commented 1 year ago

yes it iteratively gets there:

Breakpoint 1, iread (fd=4, buf=0x5555555a1dd4 "", len=7944) at os.c:195
195             n = read(fd, buf, len);
(gdb) continue
Continuing.
Waiting for data... (interrupt to abort)
Breakpoint 1, iread (fd=4, buf=0x5555555a1dd4 "", len=7944) at os.c:195
195             n = read(fd, buf, len);
(gdb) continue
Continuing.

Breakpoint 1, iread (fd=4, buf=0x5555555a1dd4 "", len=7944) at os.c:195
195             n = read(fd, buf, len);
(gdb) continue
Continuing.

Breakpoint 1, iread (fd=4, buf=0x5555555a1dd4 "", len=7944) at os.c:195
195             n = read(fd, buf, len);
(gdb) continue
Continuing.

Breakpoint 1, iread (fd=4, buf=0x5555555a1dd4 "", len=7944) at os.c:195
195             n = read(fd, buf, len);
(gdb) continue
Continuing.

Breakpoint 1, iread (fd=4, buf=0x5555555a1dd4 "", len=7944) at os.c:195
195             n = read(fd, buf, len);
(gdb) continue
Continuing.

Breakpoint 1, iread (fd=4, buf=0x5555555a1dd4 "", len=7944) at os.c:195
195             n = read(fd, buf, len);
(gdb) 
gwsw commented 1 year ago

Ok, can you do this?

  1. Under gdb, run less on a file and press F.
  2. Press ctrl-C, and set a breakpoint just after the read() call in iread (line 196).
  3. Continue. At the breakpoint, check n, the return value from read. It should be zero.
  4. In a separate window, append more data to the input file.
  5. Continue, and at the next breakpoint, check n again. It should be nonzero.
  6. Delete the breakpoint and continue.
  7. You should see the new data on the screen just above the "Waiting for data" message.

Let me know at which step you see something different than described above.

cipitaua commented 1 year ago

I've tried, but n is always zero:

Waiting for data... (interrupt to abort)
Program received signal SIGINT, Interrupt.
0x00007ffff7e452c6 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) break os.c:196
Breakpoint 1 at 0x5555555704cf: file os.c, line 196.
(gdb) continue
Continuing.

Breakpoint 1, iread (fd=4, buf=0x5555555a1d32 "", len=8106) at os.c:196
196             reading = 0;
(gdb) print n
$1 = 0
(gdb) continue
Continuing.

Breakpoint 1, iread (fd=4, buf=0x5555555a1d32 "", len=8106) at os.c:196
196             reading = 0;
(gdb) print n
$2 = 0
(gdb) disable 1
(gdb) continue
Continuing.
gwsw commented 1 year ago

Well, that's pretty baffling. I don't know why read() would not be returning the data that you added to the file. Are you still using a /dev/shm file for testing? If so, I guess you could try using a regular file on the disk in case there's a bug in shm on your system. Can you run uname -a and let me know what version of Linux you're running? Also check this on the two Debian systems that don't have the bug and see if the versions are different.

cipitaua commented 1 year ago

Well, that's pretty baffling. I don't know why read() would not be returning the data that you added to the file. Are you still using a /dev/shm file for testing?

yes, I also sync after each write

If so, I guess you could try using a regular file on the disk in case there's a bug in shm on your system.

I have at first realized about this problem on regular files, while now I'm using shm just to save writings, but the problem does not depend on the filesystem.

Can you run uname -a and let me know what version of Linux you're running? Also check this on the two Debian systems that don't have the bug and see if the versions are different.

I'm encountering this problem on two machines for some time, regardless the kernel version, from 5.16.0 to 6.0.2. The same is for the machine which doesn't show the bug (presently kernel 6.0.2).

But since tail -f always works, I don't think it shuld be addressed as an OS issue. However, I now notice that the two machines showing the bug are with AMD Ryzen, while my work desktop is Intel based. Could it be a library compiling issue related to the cpu brand?

cipitaua commented 1 year ago

i've updated the previous post

cipitaua commented 1 year ago

However, I now notice that the two machines showing the bug are with AMD Ryzen, while my work desktop is Intel based. Could it be a library compiling issue related to the cpu brand?

any suggestion?

cipitaua commented 1 year ago

closing due to no response

cipitaua commented 5 months ago

I finally discovered that the problem was in the LESSOPEN env variable:

echo $LESSOPEN 
| /usr/share/source-highlight/src-hilite-lesspipe.sh %s

by launching less with the -L flag it can follow the files, otherwise it cannot.

gwsw commented 3 months ago

I'm not sure there's anything less can do about this. The scenario is: the LESSOPEN script reads the file and writes the possibly modified data to a pipe that less reads. When the script reaches the end of the file, it closes the pipe and exits. Any new data written to the file after that is not seen by the script (which has exited) or by less (which has only seen the pipe close). I think the only way this could work is if the LESSOPEN script is written so that when it reads EOF on the file, it continues to poll the file for any new data and leaves the pipe open to write that new data to. Most scripts aren't written that way, but in any case, any such change would need to be done to the script, not to less.

cipitaua commented 3 months ago

in alternative, when the follow mode is enabled, less could reload the file and bypass the LESSOPEN script (force the -L flag)