mk-fg / infinite-image-scroller

Python/GTK desktop app to scroll images across the window carousel-style
Do What The F*ck You Want To Public License
15 stars 3 forks source link

Memory leak with pixbuf_proc.so #4

Closed Flurrywinde closed 3 years ago

Flurrywinde commented 3 years ago

If I leave it running for awhile, ram usage steadily climbs. It doesn't happen when not using pixbuf_proc.so. Figure_1

As the figure shows, memory usage was steadily increasing until around 100 seconds. That might be the point when I increased the scroll speed a lot. The point where it tops out at around 120 seconds is probably where it started to write to disk and the system slowed to a halt.

Again, I hope this kind of feedback is welcome. Just trying to be helpful.

mk-fg commented 3 years ago

Thanks for reporting, but same as with earlier report, it's unfortunately not very useful.

Please start with an assumption that tool works for me (which it does), as otherwise I'd have either fixed the issue or wrote something like "Known Issues" in the README. It's also possible that I might not notice the issue, but it's probably not the case with 4G being eaten and some kind of crash, and you've probably seen a video in the other issue where thing ran for like 2min without problems.

So, starting with that (hopefully now obvious) assumption, question is - what's different about your case? And here you didn't provide command and options, images you run it on, platform info, how you build .so file - i.e. there seem to be absolutely no info on how to reproduce the issue for me here. (I'm honestly puzzled as to what was your thought process behind reporting it like that)

mk-fg commented 3 years ago

Not really sure how to profile that .so for memory leaks either, actually, as gtk should do its memory management and GC stuff here. But guess I can leave it running overnight to check if issue is maybe reproducible with more-or-less default options here, just less noticeable maybe, or only happens on a longer timeframe.

mk-fg commented 3 years ago

Just ran it as gcc -O2 -fpic --sharedpython3-config --includes`pkg-config --cflags gtk+-3.0 pixbuf_proc.c -o pixbuf_proc.so && ./infinite-image-scroller.py --pos=M2 -s0 --queue=8:0.6 -a 10:0.001 -- /mnt/volatiles/images` from the repo on ~5G of about 1080p-ish wallpaper-like images, zipping at a crazy speed on 1280x1024 display.

Occasionally checking atop, see that VSIZE was around 1G all the time (allocated memory), and RSIZE (actually used memory) quickly jumped to 300-400M, presumably after stumbling upon some of the largest images, and at some point within first 3m climbed to 450M (maybe stumbling upon even larger image) and stayed there for the next 10m until directory ran out of images.

This seem to be the expected behavior, and if there was a memleak triggering on every run of that helper, presumably ~5k images would've triggered some kind of steady grow, which I didn't observe.

Guess I'll close the issue, as I've no idea what else to do about it. Feel free to reopen, but please provide information on how to reproduce it or something.

Flurrywinde commented 3 years ago

Oops, yeah, I forgot some info. Here it is: I ran it with just the -l option (and still the default queue-size of 3) and just three images. Then I put it on a fairly high speed which enables me to watch the ram usage climb. Under normal conditions (more images, slower speed), it takes a few hours before I see ram usage getting high. A few more hours and it bogs down the system to the point of unusability.

If you wish to look into it, maybe use just three images with the -l loop option, high speed, and watch the ram.

I wish I knew more about finding memory leaks in C, but until I do, not much else I can do. I looked at the .c code a bit yesterday, but couldn't find anything.

mk-fg commented 3 years ago

If issue is in that code, guess it should be easy to trigger/isolate by running it in a loop without any kind of UI.

Added pixbuf_proc_loop.py script that does that, you can run it from the repo like this:

% gcc -O2 -fpic --shared `python3-config --includes` `pkg-config --cflags gtk+-3.0` -lgtk-3 pixbuf_proc.c -o pixbuf_proc.so
% nice ./pixbuf_proc_loop.py -r20 -t500 images/*

Running such loop here on a 3 random images ( https://e.var.nz/2020-11-30.image-scroller.test-images.zip ) via commands above for a while, getting this output here:

Started image-processing loop: images=3 threads=4 report-interval=20s stop-after=500s

Processing report: n=0 images=337 time=[0:00:20.102620]
  cpu=0.0s [user=0.0 sys=0.0] mem-rss=197.2M [+0.0M] mem-vss=331.4M [+0.0M]

Processing report: n=1 images=688 time=[0:00:40.203622]
  cpu=75.3s [user=74.5 sys=0.8] mem-rss=197.2M [+0.0M] mem-vss=331.4M [+0.0M]

...

Processing report: n=15 images=5,438 time=[0:05:21.615633]
  cpu=1,093.2s [user=1,079.7 sys=13.5] mem-rss=197.2M [+0.0M] mem-vss=331.4M [+0.0M]

^CProcessing report: n=16 images=5,456 time=[0:08:20]
  cpu=1,096.6s [user=1,083.1 sys=13.5] mem-rss=197.8M [+0.5M] mem-vss=331.4M [+0.0M]

There doesn't seem to be any memory usage increase at all, but maybe try this on your machine and see if you get something different.

If it doesn't eat memory for you either, then it's not this C code at all, but something else in GTK when it's being used like this, which might be some kind of race condition bug with threads (seems unlikely), or maybe returned image buffers not being properly free'd after use with GTK somewhere, as that'd be the other difference between using pure-python version and this helper lib.

Flurrywinde commented 3 years ago

I followed the above instructions to recompile the helper, then run the looper. The helper compiles fine, but running the looper gives:

Traceback (most recent call last):
  File "./pixbuf_proc_loop.py", line 4, in <module>
    import pixbuf_proc as pp
ImportError: /home/kanon/util/iscr-myfork/pixbuf_proc.so: undefined symbol: g_object_unref

Strangely, infinite-image-scroller.py, which also imports pixbuf_proc still works fine. Ah, I think I figured it out. I had to add this to the top of pixbuf_proc_loop.py:

import gi
gi.require_version('Gtk', '3.0')
gi.require_version('Gdk', '3.0')
gi.require_version('GLib', '2.0')
gi.require_version('GdkPixbuf', '2.0')
from gi.repository import Gtk, Gdk, GdkPixbuf, GLib

Now it ran, and I got basically the same result as you, no memory usage increase:

Started image-processing loop: images=3 threads=4 report-interval=20s stop-after=500s

Processing report: n=0 images=2,173 time=[0:00:20.100863]
  cpu=0.0s [user=0.0 sys=0.0] mem-rss=97.3M [+0.0M] mem-vss=499.9M [+0.0M]

Processing report: n=1 images=4,300 time=[0:00:40.200931]
  cpu=54.9s [user=50.8 sys=4.1] mem-rss=72.5M [-24.7M] mem-vss=499.9M [+0.0M]
...
Processing report: n=23 images=50,552 time=[0:08:02.449124]
  cpu=1,252.9s [user=1,166.1 sys=86.8] mem-rss=104.0M [+6.7M] mem-vss=499.9M [+0.0M]

Processing report: n=24 images=52,809 time=[0:08:22.566632]
  cpu=1,311.4s [user=1,220.6 sys=90.8] mem-rss=99.6M [+2.3M] mem-vss=499.9M [+0.0M]

Maybe my GTK is an older version than yours and has a memory leak bug? This dpkg -l 'libgtk*' | grep -e '^i' | grep -e 'libgtk-*[0-9]' results in:

ii  libgtk-3-0:amd64             3.22.30-1ubuntu4  amd64        GTK+ graphical user interface library
ii  libgtk-3-bin                 3.22.30-1ubuntu4  amd64        programs for theGTK+ graphical user interface library
ii  libgtk-3-common              3.22.30-1ubuntu4  all          common files forthe GTK+ graphical user interface library
ii  libgtk-3-dev:amd64           3.22.30-1ubuntu4  amd64        development files for the GTK+ library
ii  libgtk2-perl                 2:1.24992-1build1 amd64        Perl interface to the 2.x series of the Gimp Toolkit library
ii  libgtk2.0-0:amd64            2.24.32-1ubuntu1  amd64        GTK+ graphical user interface library
ii  libgtk2.0-bin                2.24.32-1ubuntu1  amd64        programs for theGTK+ graphical user interface library
ii  libgtk2.0-cil                2.12.40-2         amd64        CLI binding for the GTK+ toolkit 2.12
ii  libgtk2.0-cil-dev            2.12.40-2         amd64        CLI binding for the GTK+ toolkit 2.12
ii  libgtk2.0-common             2.24.32-1ubuntu1  all          common files forthe GTK+ graphical user interface library
ii  libgtk2.0-dev                2.24.32-1ubuntu1  amd64        development files for the GTK+ library
ii  libgtk3-nocsd0:amd64         3-1ubuntu1        amd64        Library to disable Gtk+ 3 client side decorations (CSD)
ii  libgtk3-perl                 0.032-1           all          Perl bindings for the GTK+ graphical user interface library
mk-fg commented 3 years ago

I followed the above instructions to recompile the helper, then run the looper. The helper compiles fine, but running the looper gives: ImportError: /home/kanon/util/iscr-myfork/pixbuf_proc.so: undefined symbol: g_object_unref

You skipped the first gcc ... -lgtk-3 pixbuf_proc.c -o pixbuf_proc.so line that I suggested under "you can run it from the repo like this:" above. It has -lgtk-3 to link .so against libgtk-3.so so that it'd load it. Don't think loading it from python would change much though, and maybe minor couple-meg jitter comes from that and its (somewhat complex) GI bindings.

Interesting that script uses 2x less memory for you, maybe due to differences in gtk build between Arch that I have here and Ubuntu.

Maybe my GTK is an older version than yours and has a memory leak bug?

Mine is 3.24.23 atm, but don't think I've seen it a year ago. It's quite unlikely for something that major to have that kind of surface bug, as this script isn't really doing anything fancy.

Flurrywinde commented 3 years ago

Even when I did compile it with -lgtk-3, I got the same ImportError.

So anyway, assuming it's not the GTK version, what could it be? It sounds like you're not experiencing it, so it shouldn't be in infinite-image-scroller.py, should it? Well, I'll look into it...

All right, I followed the guide here: https://stackoverflow.com/questions/61288749/finding-memory-leak-in-python-by-tracemalloc-module to use gcore and chap.

Here's what I got: Screenshot 2020-11-30 15:27:52

I hope the warnings don't invalidate the results.

I'm new to think kind of thing, but from what I can gather, we can't tell if the high heap and memory leaks are from python or the helper. Running without the helper, there are still leaks, though, just a lot fewer, so I guess this is evidence there could be even more leaks in the helper part of the python code. Running chap on the looper showed no leaks, so this shows what we already knew--the helper is not the culprit.

Prior to today, I experimented with tracemalloc also, but it doesn't show anything (to my inexperienced eye, anyway). I let infinite-image-scroller run for awhile, collecting snapshots each loop_iter() loop and triggered the following output (with SIGUSR2), which shows the top 10 memory uses at the time, plus comparison to the first snapshot taken:

Top 10 lines
#1: <frozen importlib._bootstrap_external>:580: 252.5 KiB
#2: /home/kanon/.local/lib/python3.8/site-packages/gi/module.py:207: 23.9 KiB
    wrapper = metaclass(name, bases, dict_)
#3: /home/kanon/.local/lib/python3.8/site-packages/gi/types.py:52: 23.7 KiB
    setattr(cls, method_info.__name__, method_info)
#4: /home/kanon/.local/lib/python3.8/site-packages/gi/types.py:51: 15.1 KiB
    for method_info in cls.__info__.get_methods():
#5: /home/kanon/.local/lib/python3.8/site-packages/gi/overrides/Gio.py:43: 10.4 KiB
    return Gio.Application.run(self, *args, **kwargs)
#6: /home/kanon/.local/lib/python3.8/site-packages/gi/module.py:139: 8.5 KiB
    wrapper = enum_add(g_type)
#7: infinite-image-scroller.py:374: 7.7 KiB
    adj.set_value(self.dim_scroll_translate(pos, pos_max))
#8: /usr/local/lib/python3.8/abc.py:85: 7.1 KiB
    cls = super().__new__(mcls, name, bases, namespace, **kwargs)
#9: /home/kanon/.local/lib/python3.8/site-packages/gi/module.py:155: 7.0 KiB
    setattr(wrapper, value_name, wrapper(value_info.get_value()))
#10: /usr/local/lib/python3.8/tracemalloc.py:532: 6.9 KiB
    traces = _get_traces()
795 other: 386.7 KiB
Total allocated size: 749.5 KiB
[ Top 10 - compared to very first snapshot ]
infinite-image-scroller.py:374: size=8099 B (+8099 B), count=49 (+49), average=165 B
/home/kanon/.local/lib/python3.8/site-packages/gi/overrides/Gio.py:43: size=10.2 KiB (+8076 B), count=62 (+47), average=169 B
/usr/local/lib/python3.8/tracemalloc.py:532: size=7024 B (+6968 B), count=80 (+79), average=88 B
/home/kanon/.local/lib/python3.8/site-packages/gi/module.py:133: size=5945 B (+5945 B), count=67 (+67), average=89 B
/home/kanon/.local/lib/python3.8/site-packages/gi/module.py:155: size=7142 B (+4910 B), count=91 (+63), average=78 B
/home/kanon/.local/lib/python3.8/site-packages/gi/overrides/GLib.py:612: size=4648 B (+3800 B), count=83 (+78), average=56 B
<frozen importlib._bootstrap_external>:580: size=253 KiB (-2936 B), count=2614 (-26), average=99 B
/home/kanon/.local/lib/python3.8/site-packages/gi/module.py:154: size=3061 B (+2108 B), count=49 (+32), average=62 B
infinite-image-scroller.py:316: size=1991 B (+1991 B), count=17 (+17), average=117 B
infinite-image-scroller.py:365: size=1779 B (+1779 B), count=14 (+14), average=127 B

Finally, I tried calling gc.collect() at the end of image_cycle(), but it would return 0 each time. (And ram usage would still steadily climb.)

mk-fg commented 3 years ago

comparison to the first snapshot taken:

I think comparison to a pre-last would be way more useful, as I think first snapshot is before even importing anything, hence shows all the GI initialization as "memory leaks", even though obviously it's done only once on app start.

EDIT: also yeah, numbers there seem to be too tiny to be of any relevance anyway.

mk-fg commented 3 years ago

Also, what python-side tracing modules do is trace GC for python objects, e.g. if you have a list and append objects to it, but never remove, so it grows infinitely. GI bindings might have some issues in that regard, if it's an unlikely bug there, but don't think anything happening in actual libgtk calls (and its gobject allocations) should be there.

mk-fg commented 3 years ago

Even when I did compile it with -lgtk-3, I got the same ImportError.

I think it should either be different error, failed build, or .so file is imported from some unrelated place, i.e. you're not updating it. Running something like python3 -c 'import pixbuf_proc; print(pixbuf_proc)' should show the path it's being imported from.

mk-fg commented 3 years ago

So anyway, assuming it's not the GTK version, what could it be? It sounds like you're not experiencing it, so it shouldn't be in infinite-image-scroller.py, should it?

Yeah, I'd suspect small bit of code that's between that .so library and GI/GTK - one that gets memory buffers from C and assigns them to Image widgets, but that doesn't explain why it doesn't seem to leak here.

mk-fg commented 3 years ago

There are probably GTK-specific tools to debug leaks there, but I'd probably opt for a simplier and faster approach - take that known-leakless loop-script, add GTK window there and cycle Images there using same code as in main script. If that doesn't leak, also add creating/removing Image widgets, and if that doesn't leak, add the scrolling code and you get full leak-proof script somehow :) (obviously if full script is leaking, then something along that path will too, and given that it's couple lines of code, should be easy to tweak them until leak is gone, then use that in the original version)

mk-fg commented 3 years ago

I'm new to think kind of thing, but from what I can gather, we can't tell if the high heap and memory leaks are from python or the helper.

Not sure, but maybe. I think valgrind could print tracebacks for allocations, so that even if they ended up in same gobject calling libc somewhere, it might be possible to tell these apart by whether call originated in python binary code or that .so file.

Running without the helper, there are still leaks, though, just a lot fewer, so I guess this is evidence there could be even more leaks in the helper part of the python code.

Should probably try to run something like this myself, see if maybe leaks are just less noticeable here due to apparently different memory usage somewhere (hence that 2x diff when running same loop script), but visible in such tools.

Flurrywinde commented 3 years ago

Even when I did compile it with -lgtk-3, I got the same ImportError.

I think it should either be different error, failed build, or .so file is imported from some unrelated place, i.e. you're not updating it. Running something like python3 -c 'import pixbuf_proc; print(pixbuf_proc)' should show the path it's being imported from.

[ ~/util/iscr-myfork ] $ gcc -O2 -fpic --shared `python3-config --includes` `pkg-config --cflags gtk+-3.0` -lgtk-3 pixbuf_proc.c -o pixbuf_proc.so
[ ~/util/iscr-myfork ] $  python3.8 -c 'import gi; from gi.repository import Gtk, Gdk, GdkPixbuf, GLib; import pixbuf_proc; print(pixbuf_proc)'
<string>:1: PyGIWarning: Gtk was imported without specifying a version first. Use gi.require_version('Gtk', '3.0') before import to ensure that the right version gets loaded.
<module 'pixbuf_proc' from '/home/kanon/util/iscr-myfork/pixbuf_proc.so'>
[ ~/util/iscr-myfork ] $ nice ./pixbuf_proc_loop.py -r20 -t500 *.jpg *.png
Traceback (most recent call last):
  File "./pixbuf_proc_loop.py", line 4, in <module>
    import pixbuf_proc as pp
ImportError: /home/kanon/util/iscr-myfork/pixbuf_proc.so: undefined symbol: g_object_unref
Flurrywinde commented 3 years ago

Interesting that script uses 2x less memory for you, maybe due to differences in gtk build between Arch that I have here and Ubuntu.

Actually, I think it's pretty much the same. I didn't know you'd look at that, so when I ran it with my own three images and verified no memory leak, I didn't bother trying it with yours. Sorry for the confusion! Here it is on my system with your 3 example images:

Started image-processing loop: images=3 threads=4 report-interval=20s stop-after=500s

Processing report: n=0 images=344 time=[0:00:20.098937]
  cpu=0.0s [user=0.0 sys=0.0] mem-rss=209.7M [+0.0M] mem-vss=501.9M [+0.0M]

Processing report: n=1 images=706 time=[0:00:40.204545]
  cpu=59.4s [user=58.9 sys=0.5] mem-rss=209.7M [+0.0M] mem-vss=501.9M [+0.0M]

...

Processing report: n=23 images=8,428 time=[0:08:02.426769]
  cpu=1,327.9s [user=1,315.7 sys=12.2] mem-rss=209.7M [+0.0M] mem-vss=501.9M [+0.0M]

Processing report: n=24 images=8,805 time=[0:08:22.527980]
  cpu=1,389.7s [user=1,377.0 sys=12.8] mem-rss=209.7M [+0.0M] mem-vss=501.9M [+0.0M]
Flurrywinde commented 3 years ago

If that doesn't leak, also add creating/removing Image widgets, and if that doesn't leak, add the scrolling code and you get full leak-proof script somehow :)

Sounds like a good plan. I'll get on it.

I'm new to think kind of thing, but from what I can gather, we can't tell if the high heap and memory leaks are from python or the helper.

Not sure, but maybe. I think valgrind could print tracebacks for allocations, so that even if they ended up in same gobject calling libc somewhere, it might be possible to tell these apart by whether call originated in python binary code or that .so file.

I tried to use valgrind, but doesn't it need the program to be compiled differently? Valgrind didn't output anything, and I don't know how to fix a python script to give it what it needs. Do you?

Running without the helper, there are still leaks, though, just a lot fewer, so I guess this is evidence there could be even more leaks in the helper part of the python code.

Should probably try to run something like this myself, see if maybe leaks are just less noticeable here due to apparently different memory usage somewhere (hence that 2x diff when running same loop script), but visible in such tools.

That's a good idea. It might also help rule out GTK version difference too.

mk-fg commented 3 years ago

gcc -O2 -fpic --shared python3-config --includes pkg-config --cflags gtk+-3.0 -lgtk-3 pixbuf_proc.c -o pixbuf_proc.so

Hm, thought that libgtk should always be linked against libgobject and load it and all other stuff with it, but guess either I misunderstand how such linking works or it's just not the case on ubuntu. Guess proper way should be to use --libs for pkg-config so that it'd generate these -l flags as necessary on the system.

Updated README and such to suggest that in 4cd81ef, i.e. something like this to build .so file:

gcc -O2 -fpic --shared `python3-config --includes` `pkg-config --libs --cflags gtk+-3.0` pixbuf_proc.c -o pixbuf_proc.so

Hopefully that should work, but as mentioned, don't think it really matters for testing and such.

mk-fg commented 3 years ago

Actually, I think it's pretty much the same. I didn't know you'd look at that,

Ah, I just didn't think it'd be that much different with other images. But looks like one of these turned out to be 3300x2000 RGBA PNG, and guess that's 3300*2000*4 / 2**20 = 25M pixbuf, replicated 4x or 8x or whatever-number-of-cores-is by each thread, so shouldn't be surprising.

mk-fg commented 3 years ago

I tried to use valgrind, but doesn't it need the program to be compiled differently? Valgrind didn't output anything, and I don't know how to fix a python script to give it what it needs. Do you?

No, not really.

I've used it a couple of times in the past, but barely remember what it needs or how it works by now. Thought it'd replace libc's malloc()/free() to at least inspect current stack and check which libs frames there point to, so even without debug symbols should at least print those, but maybe not. Or iirc it has like 1000 flags, maybe some of those are needed.

I generally try to avoid touching C/C++ exactly because of how insane it gets very fast when you get to those lower abstraction levels, where all you get is "F U" type of erros and random segfaults with tools to debug these looking more menacing than the bugs themselves :)

mk-fg commented 3 years ago

Looks like chap straight-up says that its results will be junk for me here:

% ./chap core.12292
Fast bin corruption was found for the arena at 0x7fc664295a00
  Leak analysis will not be accurate.
  Used/free analysis will not be accurate for the arena.
  The fast bin list headed at 0x7fc664295a18 has a node
  0x55eb280bf738 not matching an allocation.
  The fast bin list headed at 0x7fc664295a38 has a node
  0x55ee77081 not matching an allocation.
Fast bin corruption was found for the arena at 0x7fc63c000020
  Leak analysis will not be accurate.
  Used/free analysis will not be accurate for the arena.
  The fast bin list headed at 0x7fc63c000048 has a node
  0x7fc1c0637c7a not matching an allocation.
Fast bin corruption was found for the arena at 0x7fc64c000020
  Leak analysis will not be accurate.
  Used/free analysis will not be accurate for the arena.
  The fast bin list headed at 0x7fc64c000038 has a node
  0x7fc64c004 not matching an allocation.
Fast bin corruption was found for the arena at 0x7fc650000020
  Leak analysis will not be accurate.
  Used/free analysis will not be accurate for the arena.
  The fast bin list headed at 0x7fc650000040 has a node
  0x7fc650002 not matching an allocation.
Fast bin corruption was found for the arena at 0x7fc654000020
  Leak analysis will not be accurate.
  Used/free analysis will not be accurate for the arena.
  The fast bin list headed at 0x7fc654000030 has a node
  0x7fc1a8651447 not matching an allocation.
  The fast bin list headed at 0x7fc654000038 has a node
  0x7fc1a8653ce7 not matching an allocation.
  The fast bin list headed at 0x7fc654000040 has a node
  0x7fc1a8653af7 not matching an allocation.
  The fast bin list headed at 0x7fc654000058 has a node
  0x7fc1a865c037 not matching an allocation.
Warning: At least one readable stack guard has been found.
 This generally means that the gdb code that created the core has a bug
 and that the permissions were marked wrong in the core.
chap>

Should probably try on Ubuntu VM, and guess it's the other side of the abstraction bloat - all these pythons and GTK OS-in-OS layers are nice, until there's a memleak and you get worst of both worlds, trying to debug this whole pile of junk using low-level tools :)

GObject should probably have its own refcount-debug options though, gotta check these first, see if maybe it's the layer where some unused objects might be piling up...

mk-fg commented 3 years ago

Hm, looks like GTK/GNOME docs suggest using valgrind for debugging memory issues with it too: https://developer.gnome.org/programming-guidelines/stable/memory-management.html.en#:~:text=Runtime%20leak%20checking https://developer.gnome.org/programming-guidelines/stable/tooling.html.en#valgrind

Running it here using this command, as suggested in the last link: valgrind --tool=memcheck --leak-check=full -- python infinite-image-scroller.py --debug --pos=M1 -a 10:0.001 -l -- images/

It gives me a ton of tracebacks, but for a random potential/minor issues, which links above specifically suggest to ignore, and only on app start - after threads start logging image loads, there seem to be no issues.

These backtraces look like this here:

==12560== Thread 1:
==12560== Invalid read of size 4
==12560==    at 0x4984B32: PyMem_Realloc (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x4A19646: ??? (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x49AE2C4: ??? (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x498DAC0: _PyEval_EvalFrameDefault (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x499E137: _PyFunction_Vectorcall (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x49ADB3B: ??? (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x49B12C1: PyObject_Call (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x5A4BE99: ??? (in /usr/lib/python3.8/site-packages/gi/_gi.cpython-38-x86_64-linux-gnu.so)
==12560==    by 0x5A7B8C1: ??? (in /usr/lib/libffi.so.7.1.0)
==12560==    by 0x5A7BC1F: ??? (in /usr/lib/libffi.so.7.1.0)
==12560==    by 0x638E923: g_main_context_dispatch (in /usr/lib/libglib-2.0.so.0.6600.1)
==12560==    by 0x63E2620: ??? (in /usr/lib/libglib-2.0.so.0.6600.1)
==12560==  Address 0x5302020 is 16 bytes before a block of size 32 alloc'd
==12560==    at 0x483CD7B: realloc (vg_replace_malloc.c:834)
==12560==    by 0x49A9D2C: ??? (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x498DAC0: _PyEval_EvalFrameDefault (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x499E137: _PyFunction_Vectorcall (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x49ADB3B: ??? (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x49B12C1: PyObject_Call (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x498F837: _PyEval_EvalFrameDefault (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x499E137: _PyFunction_Vectorcall (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x498DAC0: _PyEval_EvalFrameDefault (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x499E137: _PyFunction_Vectorcall (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x498DAC0: _PyEval_EvalFrameDefault (in /usr/lib/libpython3.8.so.1.0)
==12560==    by 0x499E137: _PyFunction_Vectorcall (in /usr/lib/libpython3.8.so.1.0)
==12560==

And it's mostly just python frames, with ??? likely being stuff that's been optimized-out by the compiler.

After stopping by ^C, "definitely lost" looks pretty tiny - something like 320B or 3kB after couple minutes. Window itself gets all artifacted if scrolling speed is too fast, as I think attached valgrind slows GTK too much to keep up updating window in-between scroll events, though "lost" values seem to be tiny either way, not sure if it matters if stuff gets rendered.

Actually have Ubuntu 20.04 box nearby setup for running EVE-NG VMs, gotta try leaving something like that running there, as it'd be closer to your setup and not bother me on the main desktop machine otherwise.

mk-fg commented 3 years ago

"Tooling" link above also suggests helgrind and drd, which seem to be for threaded stuff, but don't really know if threads are the issue here - on one hand they are only used for running image loading via that .so file that's confirmed to be fine, but on the other hand they call g-stuff concurrently with the main thread, and maybe that interaction causes the problem. (though if it really turns out to be some multithreading gtk bug, workaround other than just "don't use threads" or "use different gtk/build/distro/platform" might be difficult, so maybe it's not worth exploring such possibility because of that)

mk-fg commented 3 years ago

have Ubuntu 20.04 box nearby setup for running EVE-NG VMs, gotta try leaving something like that running there

Left it running the loop on same three images with ./infinite-image-scroller.py --debug --pos=M1 -a 10:0.001 -l -- images/ for ~8 hours here, it started with ~303-311M (with 905M vsize) within first minutes (with 8 threads there), and stayed there after 8h, with ~250% cpu load for this kind of scroll/processing speed. (also with 1024x768 vesa screen, but stuff did scroll there, according to x11vnc)

So don't think I can get anything from valgrind if nothing seem to leak there either, but guess you can try in your setup. If you use that weird mismatching Ubuntu 18.04 + newer python setup, I'd suspect some ABI mismatch between libs there, or maybe indeed a bug of some kind in old versons.

Which I think something like AppImage can also fix btw, as it should have whole set of libs bundled in there, frozen at some working point, and properly linked together, at least... gotta check how to make one. (shipping any kind of binary stuff is a bit icky, especially without at least some central clearing authority like flathub or that snap central, but maybe can add commands or script to make one to README)

mk-fg commented 3 years ago

(also with 1024x768 vesa screen, but stuff did scroll there, according to x11vnc)

Correction: 1280x1024 libfbdevhw.so screen.

mk-fg commented 3 years ago

Which I think something like AppImage can also fix btw, as it should have whole set of libs bundled in there, frozen at some working point, and properly linked together, at least... gotta check how to make one.

It's indeed quite trivial to put some minimal one together:

Which is great, but here comes the bad part:

% ./infinite-image-scroller-x86_64.AppImage -h
/tmp/.mount_infiniiug24C
/bin/sh: Relink `/tmp/.mount_infiniiug24C/usr/lib/libncursesw.so.6' with `/usr/lib/libc.so.6' for IFUNC symbol `strcpy'
zsh: segmentation fault (core dumped)  ./infinite-image-scroller-x86_64.AppImage -h

If I understand this error correctly, and given that produced .AppImage is an ELF binary presumably itself loading libc, this approach still runs a mix of junk from the host OS and AppImage, which is exactly what I was trying to avoid by using it. And it immediately shows why it is a terrible idea to do so in a spectacular fashion - segfault, not even able to run "sh" there. Exactly same problem as you'd get by making a patchwork OS from e.g. ubuntu 18.04 and other third-party sources.

Looks like it's not a solution for something portable, unfortunately. I believe flatpak and snaps do install their core libs properly and start from them, same as e.g. docker and other server containers/runtimes, but developing and packaging stuff with this looks like even more of an ordeal, and not too interested in the results there, as other than with one self-contained binary approach, these linux desktop packages don't look at all interesting to me.

So don't think I'll come up with any solution for that seemingly platform-specific issue, but if you'll find out what the problem was, maybe let me know - might be possible to check and add a workaround, if it's not too unique or only for some old system (and probably not useful to keep around in the repo, as such). Can answer any questions too, of course, but as mentioned, don't think I know much about debugging memleaks in such fat stacks of libs and scripts myself, I'm afraid.

mk-fg commented 3 years ago

Looks like it's not a solution for something portable, unfortunately. I believe flatpak and snaps do install their core libs properly and start from them, same as e.g. docker and other server containers/runtimes

Actually main difficulty with AppImage (that is absent in container solutions) seem to be the lack of built-in filesystem isolation, which leads to hardcoded interpreter paths going to /lib on outside rootfs, and complex apps like python and gtk loading stuff from /usr and going all over the place. It can be much nicer to use with a more monolithic compiled apps though.

Flurrywinde commented 3 years ago

Guess proper way should be to use --libs for pkg-config so that it'd generate these -l flags as necessary on the system.

I know it probably doesn't matter to testing, but it's still happening with the new gcc options:

[ ~/util/iscr-myfork ] $ gcc -O2 -fpic --shared `python3-config --includes` `pkg-config --libs --cflags gtk+-3.0` pixbuf_proc.c -o pixbuf_proc.so
[ ~/util/iscr-myfork ] $ python3.8 -c 'import gi; from gi.repository import Gtk, Gdk, GdkPixbuf, GLib; import pixbuf_proc; print(pixbuf_proc)'
<string>:1: PyGIWarning: Gtk was imported without specifying a version first. Use gi.require_version('Gtk', '3.0') before import to ensure that the right version gets loaded.
<module 'pixbuf_proc' from '/home/kanon/util/iscr-myfork/pixbuf_proc.so'>
[ ~/util/iscr-myfork ] $ nice ./pixbuf_proc_loop.py -r20 -t500 *.jpg *.png
Traceback (most recent call last):
  File "./pixbuf_proc_loop.py", line 4, in <module>
    import pixbuf_proc as pp
ImportError: /home/kanon/util/iscr-myfork/pixbuf_proc.so: undefined symbol: g_object_unref
Flurrywinde commented 3 years ago

Looks like chap straight-up says that its results will be junk for me here

That's too bad. I wonder what's wrong? And I see what you mean about debugging tools seeming more menacing than the bugs themselves. Luckily, I got valgrind to work. For some reason, it doesn't produce the output with this: G_SLICE=debug-blocks valgrind --tool=memcheck --leak-check=full ./infinite-image-scroller.py -l -- images/. It needs me to put the python command on the command line explicitly like this: G_SLICE=debug-blocks valgrind --tool=memcheck --leak-check=full python3.8 ./infinite-image-scroller.py -l -- images/

Then, I got the output, and it shows memory leaks too:

==22759== 8,192 bytes in 1 blocks are definitely lost in loss record 10,031 of 10,105
==22759==    at 0x4C33B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x7E1FB10: g_malloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E07CD7: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E07F3E: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E082FA: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x81128AF: ??? (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x8118A71: g_type_register_static (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x8118D44: g_type_register_static_simple (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x10118035: gtk_image_accessible_get_type (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x1028BBCD: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x8116418: g_type_class_ref (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x7BB0BEC: pygobject_init (pygobject-object.c:1334)
==22759==
==22759== 8,192 bytes in 1 blocks are definitely lost in loss record 10,032 of 10,105
==22759==    at 0x4C33B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x7E1FB10: g_malloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E07CF4: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E07F3E: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E082FA: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x81128AF: ??? (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x8118A71: g_type_register_static (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x8118D44: g_type_register_static_simple (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x10118035: gtk_image_accessible_get_type (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x1028BBCD: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x8116418: g_type_class_ref (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x7BB0BEC: pygobject_init (pygobject-object.c:1334)
==22759==
==22759== 9,104 (6,912 direct, 2,192 indirect) bytes in 27 blocks are definitely lost in loss record 10,037 of 10,105
==22759==    at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0xA8CA8ED: ??? (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.10.1)
==22759==    by 0xA8CB096: ??? (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.10.1)
==22759==    by 0xA8CC377: ??? (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.10.1)
==22759==    by 0xA8D19C3: ??? (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.10.1)
==22759==    by 0xBD6EE27: ??? (in /lib/x86_64-linux-gnu/libexpat.so.1.6.7)
==22759==    by 0xBD6FBFB: ??? (in /lib/x86_64-linux-gnu/libexpat.so.1.6.7)
==22759==    by 0xBD6D822: ??? (in /lib/x86_64-linux-gnu/libexpat.so.1.6.7)
==22759==    by 0xBD6E50A: ??? (in /lib/x86_64-linux-gnu/libexpat.so.1.6.7)
==22759==    by 0xBD720EC: XML_ParseBuffer (in /lib/x86_64-linux-gnu/libexpat.so.1.6.7)
==22759==    by 0xA8D0B42: ??? (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.10.1)
==22759==    by 0xA8D0F75: FcConfigParseAndLoad (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.10.1)
==22759==
==22759== 10,572 (72 direct, 10,500 indirect) bytes in 1 blocks are definitely lost in loss record 10,046 of 10,105
==22759==    at 0x4C33B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0xB8A6C00: XkbGetMap (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==22759==    by 0xD7385A1: ??? (in /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
==22759==    by 0xD739907: gdk_x11_keymap_key_is_modifier (in /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
==22759==    by 0xD729D73: ??? (in /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
==22759==    by 0xD73480F: ??? (in /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
==22759==    by 0xD7343E0: ??? (in /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
==22759==    by 0xD6FED6F: gdk_display_get_event (in /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
==22759==    by 0xD733F81: ??? (in /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
==22759==    by 0x7E1A416: g_main_context_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E1A64F: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E1A6DB: g_main_context_iteration (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==
==22759== 11,408 bytes in 23 blocks are possibly lost in loss record 10,047 of 10,105
==22759==    at 0x4C33E76: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x4C33F91: posix_memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x7E370E6: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E37DC2: g_slice_alloc (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E37E08: g_slice_alloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x1021CEDB: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x101ECB08: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x10212497: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x10208FFD: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x102093D4: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x10209F03: gtk_css_provider_load_from_file (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x1020A0BE: gtk_css_provider_load_from_resource (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==
==22759== 13,888 bytes in 7 blocks are possibly lost in loss record 10,052 of 10,105
==22759==    at 0x4C33B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x7E1FB10: g_malloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0xD6FCCD2: ??? (in /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
==22759==    by 0xD72821E: ??? (in /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
==22759==    by 0xD728C79: ??? (in /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
==22759==    by 0xD729243: ??? (in /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
==22759==    by 0x80FA8FF: ??? (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x80FC5BF: g_object_new_valist (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x80FC938: g_object_new (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0xD727BBB: ??? (in /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
==22759==    by 0xD72CE53: ??? (in /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
==22759==    by 0xD701D5C: gdk_display_manager_open_display (in /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
==22759==
==22759== 16,384 bytes in 1 blocks are definitely lost in loss record 10,062 of 10,105
==22759==    at 0x4C33B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x7E1FB10: g_malloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E07CD7: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E07F3E: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E082FA: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E2AA5E: g_intern_static_string (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x81007DD: g_param_spec_internal (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x81043C9: g_param_spec_int (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x103471A2: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x8116418: g_type_class_ref (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x80FBEF7: g_object_new_with_properties (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x80FC960: g_object_new (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==
==22759== 16,384 bytes in 1 blocks are definitely lost in loss record 10,063 of 10,105
==22759==    at 0x4C33B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x7E1FB10: g_malloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E07CF4: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E07F3E: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E082FA: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E2AA5E: g_intern_static_string (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x81007DD: g_param_spec_internal (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x81043C9: g_param_spec_int (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x103471A2: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x8116418: g_type_class_ref (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x80FBEF7: g_object_new_with_properties (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x80FC960: g_object_new (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==
==22759== 29,463 (4,096 direct, 25,367 indirect) bytes in 1 blocks are definitely lost in loss record 10,074 of 10,105
==22759==    at 0x4C33B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x7E1FB10: g_malloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E07CD7: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E07F3E: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E082FA: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x81013EF: g_param_spec_pool_insert (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x80FA11D: ??? (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x10346693: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x10346CF4: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x8116418: g_type_class_ref (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x80FBEF7: g_object_new_with_properties (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x80FC960: g_object_new (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==
==22759== 32,704 bytes in 4 blocks are possibly lost in loss record 10,078 of 10,105
==22759==    at 0x4C33E76: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x4C33F91: posix_memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x7E370E6: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E37DC2: g_slice_alloc (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E37E08: g_slice_alloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x8119924: g_type_create_instance (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x80FA747: ??? (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x80FBEE4: g_object_new_with_properties (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x80FC960: g_object_new (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x1021527C: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x10203123: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x10201F33: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==
==22759== 4,007,080 (4,096 direct, 4,002,984 indirect) bytes in 1 blocks are definitely lost in loss record 10,098 of 10,105
==22759==    at 0x4C33B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x7E1FB10: g_malloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E07D04: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E07F3E: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E082FA: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x81128AF: ??? (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x8118A71: g_type_register_static (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x8118D44: g_type_register_static_simple (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x10118035: gtk_image_accessible_get_type (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x1028BBCD: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x8116418: g_type_class_ref (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x7BB0BEC: pygobject_init (pygobject-object.c:1334)
==22759==
==22759== 6,489,032 (8,192 direct, 6,480,840 indirect) bytes in 1 blocks are definitely lost in loss record 10,099 of 10,105
==22759==    at 0x4C33B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x7E1FB10: g_malloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E07D04: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E07F3E: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E082FA: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E2AA5E: g_intern_static_string (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x81007DD: g_param_spec_internal (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x81043C9: g_param_spec_int (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x103471A2: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
==22759==    by 0x8116418: g_type_class_ref (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x80FBEF7: g_object_new_with_properties (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==    by 0x80FC960: g_object_new (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
==22759==
==22759== 61,834,464 bytes in 50 blocks are possibly lost in loss record 10,103 of 10,105
==22759==    at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x7E1FAB8: g_malloc (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E3945B: g_memdup (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7DF3C66: g_bytes_new (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x8572DAD: ffi_call_unix64 (in /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4)
==22759==    by 0x857271E: ffi_call (in /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4)
==22759==    by 0x7BA490E: pygi_invoke_c_callable (pygi-invoke.c:684)
==22759==    by 0x7B9C2F2: _function_cache_invoke_real (pygi-cache.c:783)
==22759==    by 0x7B9C2F2: _constructor_cache_invoke_real (pygi-cache.c:929)
==22759==    by 0x7B9C457: pygi_function_cache_invoke (pygi-cache.c:862)
==22759==    by 0x7BA024A: _callable_info_call (pygi-info.c:548)
==22759==    by 0x17ED00: _PyObject_MakeTpCall (call.c:159)
==22759==    by 0x16F365: _PyObject_Vectorcall (abstract.h:125)
==22759==    by 0x16F365: call_function (ceval.c:4963)
==22759==    by 0x16F365: _PyEval_EvalFrameDefault (ceval.c:3469)
==22759==
==22759== 142,970,256 bytes in 133 blocks are definitely lost in loss record 10,105 of 10,105
==22759==    at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x7E1FAB8: g_malloc (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E3945B: g_memdup (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7DF3C66: g_bytes_new (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x8572DAD: ffi_call_unix64 (in /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4)
==22759==    by 0x857271E: ffi_call (in /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4)
==22759==    by 0x7BA490E: pygi_invoke_c_callable (pygi-invoke.c:684)
==22759==    by 0x7B9C2F2: _function_cache_invoke_real (pygi-cache.c:783)
==22759==    by 0x7B9C2F2: _constructor_cache_invoke_real (pygi-cache.c:929)
==22759==    by 0x7B9C457: pygi_function_cache_invoke (pygi-cache.c:862)
==22759==    by 0x7BA024A: _callable_info_call (pygi-info.c:548)
==22759==    by 0x17ED00: _PyObject_MakeTpCall (call.c:159)
==22759==    by 0x16F365: _PyObject_Vectorcall (abstract.h:125)
==22759==    by 0x16F365: call_function (ceval.c:4963)
==22759==    by 0x16F365: _PyEval_EvalFrameDefault (ceval.c:3469)
==22759==
==22759== LEAK SUMMARY:
==22759==    definitely lost: 143,065,959 bytes in 293 blocks
==22759==    indirectly lost: 10,543,827 bytes in 1,304 blocks
==22759==      possibly lost: 62,097,953 bytes in 660 blocks
==22759==    still reachable: 115,857,887 bytes in 26,562 blocks
==22759==                       of which reachable via heuristic:
==22759==                         newarray           : 1,536 bytes in 16 blocks
==22759==                         multipleinheritance: 328 bytes in 1 blocks
==22759==         suppressed: 0 bytes in 0 blocks
==22759== Reachable blocks (those to which a pointer was found) are not shown.
==22759== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==22759==
==22759== For counts of detected and suppressed errors, rerun with: -v
==22759== Use --track-origins=yes to see where uninitialised values come from
==22759== ERROR SUMMARY: 37372 errors from 778 contexts (suppressed: 0 from 0)

This is just the tail end of it. I hit 'q' to quit only after ram use increased a few percent too (program running maybe 5 minutes or so). So, well, if valgrind shows no leakage on your side, then the problem is definitely in my system somewhere. What do you think? Maybe GTK?

I tried to rule out some of these results with suppression files, but I don't think it changed much. The command I used: G_SLICE=debug-blocks valgrind --suppressions=/usr/lib/valgrind/python3.supp --suppressions=/usr/lib/valgrind/debian.supp --tool=memcheck --leak-check=full python3.8 ./infinite-image-scroller.py -l --images/

The complete results:

valgrind.txt

If you use that weird mismatching Ubuntu 18.04 + newer python setup, I'd suspect some ABI mismatch between libs there, or maybe indeed a bug of some kind in old versons.

I haven't heard of ABI mismatch. (ABI = Application Binary Interface?) Could that cause memory leaks?

Flurrywinde commented 3 years ago

So don't think I'll come up with any solution for that seemingly platform-specific issue, but if you'll find out what the problem was, maybe let me know

I understand and will do. I think an app image could still be made, just that it'd have to add more parts of the system. That was the point I got up to when I was looking into it, something about starting with a whole virtual machine and then taking parts back out.

Instead, I'll try upgrading to Ubuntu 20.04. We'll see if that fixes it.

mk-fg commented 3 years ago

I know it probably doesn't matter to testing, but it's still happening with the new gcc options:

Might be an indicator that something is very broken on that system :) Worked for me on 20.04 as well, so pretty sure it's not ubuntu having something weird, like broken ld.so or .pc files.

mk-fg commented 3 years ago

I haven't heard of ABI mismatch. (ABI = Application Binary Interface?) Could that cause memory leaks?

If call semantics change, and you call something in new lib but with old args, pretty sure anything can happen - garbage in / garbage out principle. Though segfaults or something like that "undefined symbol" should probably be more common.

EDIT: also yes, ABI as in application binary interface. As in one .so calling a function in another by stashing all call arguments somewhere in some pre-defined format (which compiler knows and generates), pushing cpu state onto stack memory segment (also in some pre-defined format), stashing current instruction pointer into some register and jumping it to address where that library call/code is loaded... and god help you if anything in there is not precisely what it's expected to be :)

mk-fg commented 3 years ago

These look like the allocations responsible for pretty much all the leak to me:

==22759== 61,834,464 bytes in 50 blocks are possibly lost in loss record 10,103 of 10,105
==22759==    at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x7E1FAB8: g_malloc (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E3945B: g_memdup (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7DF3C66: g_bytes_new (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x8572DAD: ffi_call_unix64 (in /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4)
==22759==    by 0x857271E: ffi_call (in /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4)
==22759==    by 0x7BA490E: pygi_invoke_c_callable (pygi-invoke.c:684)
==22759==    by 0x7B9C2F2: _function_cache_invoke_real (pygi-cache.c:783)
==22759==    by 0x7B9C2F2: _constructor_cache_invoke_real (pygi-cache.c:929)
==22759==    by 0x7B9C457: pygi_function_cache_invoke (pygi-cache.c:862)
==22759==    by 0x7BA024A: _callable_info_call (pygi-info.c:548)
==22759==    by 0x17ED00: _PyObject_MakeTpCall (call.c:159)
==22759==    by 0x16F365: _PyObject_Vectorcall (abstract.h:125)
==22759==    by 0x16F365: call_function (ceval.c:4963)
==22759==    by 0x16F365: _PyEval_EvalFrameDefault (ceval.c:3469)
==22759==
==22759== 142,970,256 bytes in 133 blocks are definitely lost in loss record 10,105 of 10,105
==22759==    at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22759==    by 0x7E1FAB8: g_malloc (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7E3945B: g_memdup (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x7DF3C66: g_bytes_new (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
==22759==    by 0x8572DAD: ffi_call_unix64 (in /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4)
==22759==    by 0x857271E: ffi_call (in /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4)
==22759==    by 0x7BA490E: pygi_invoke_c_callable (pygi-invoke.c:684)
==22759==    by 0x7B9C2F2: _function_cache_invoke_real (pygi-cache.c:783)
==22759==    by 0x7B9C2F2: _constructor_cache_invoke_real (pygi-cache.c:929)
==22759==    by 0x7B9C457: pygi_function_cache_invoke (pygi-cache.c:862)
==22759==    by 0x7BA024A: _callable_info_call (pygi-info.c:548)
==22759==    by 0x17ED00: _PyObject_MakeTpCall (call.c:159)
==22759==    by 0x16F365: _PyObject_Vectorcall (abstract.h:125)
==22759==    by 0x16F365: call_function (ceval.c:4963)
==22759==    by 0x16F365: _PyEval_EvalFrameDefault (ceval.c:3469)

And it does indeed look like some interaction between python and GI on ABI level. If it's same python3.8 with GI built for 3.6 setup, I'd suspect semantics changing between versions and maybe some expected unref() doesn't get called, length not passed correctly somewhere or anything like that. Given that there's also some "pygi-cache.c" involved (afaik it's a complex bindings, as mentioned), and that cache invalidation is famously one of the hardest problems in CS, I'd also suspect that :)

Might be just confirmation bias talking though - was thinking that it was this issue for a while already, but even if not, fix seem to be the same - get proper matching/built-together and up-to-date set of libs (via container or 20.04 update), probably doesn't matter what's the code line and/or mismatch causing the problem in that stack.

mk-fg commented 3 years ago

Oh, actually, another fix might be to tweak a couple lines (if any) to make the thing compatible with python3.6 on 18.04, so that you won't need to use that mismatching stuff. It's probably installing py3.6 backport of "dataclasses" 3.8-stdlib module (might even be in the repos, otherwise pip should work) and that might be it even without any code changes.

Pretty sure I've even used it back in 3.6 or even 3.5 days, and probably added dataclasses sometime during 3.7. There's only such tiny differences between 3.x versions past 3.6, pretty sure, so should be easy to "downgrade" it without updating whole OS, even if there might be a couple other minor changes. Do make sure to use "python3.6" on the first line explicitly though, so that it won't run same 3.8 via "python" or "python3" symlink.

Flurrywinde commented 3 years ago

I know it probably doesn't matter to testing, but it's still happening with the new gcc options:

Might be an indicator that something is very broken on that system :) Worked for me on 20.04 as well, so pretty sure it's not ubuntu having something weird, like broken ld.so or .pc files.

Darn. Well, this is all the more reason for me to upgrade to Ubuntu 20.04 then. I hope it fixes it! Ok, back... and it's still happening:

# Run it under 20.04 for the first time... same error occurs
[ ~/util/iscr-myfork ] $ nice ./pixbuf_proc_loop.py -r20 -t500 *.jpg *.png
Traceback (most recent call last):
  File "./pixbuf_proc_loop.py", line 4, in <module>
    import pixbuf_proc as pp
ImportError: /home/kanon/util/iscr-myfork/pixbuf_proc.so: undefined symbol: g_object_unref
# Compile it one way. Same error.
[ ~/util/iscr-myfork ] $ gcc -O2 -fpic --shared `python3-config --includes` `pkg-config --libs --cflags gtk+-3.0` pixbuf_proc.c -o pixbuf_proc.so
[ ~/util/iscr-myfork ] $ nice ./pixbuf_proc_loop.py -r20 -t500 *.jpg *.png
Traceback (most recent call last):
  File "./pixbuf_proc_loop.py", line 4, in <module>
    import pixbuf_proc as pp
ImportError: /home/kanon/util/iscr-myfork/pixbuf_proc.so: undefined symbol: g_object_unref
# Compile it the previous way. Still same error occurs.
[ ~/util/iscr-myfork ] $ gcc -O2 -fpic --shared `python3-config --includes` `pkg-config --cflags gtk+-3.0` -lgtk-3 pixbuf_proc.c -o pixbuf_proc.so
[ ~/util/iscr-myfork ] $ nice ./pixbuf_proc_loop.py -r20 -t500 *.jpg *.png
Traceback (most recent call last):
  File "./pixbuf_proc_loop.py", line 4, in <module>
    import pixbuf_proc as pp
ImportError: /home/kanon/util/iscr-myfork/pixbuf_proc.so: undefined symbol: g_object_unref

Could my system still be broken? I did have a heck of a time upgrading, and the computer even froze up once. It all seems to be working now, though, and good news! The runaway memory leak no longer occurs:

==783163== 551,552 bytes in 1,112 blocks are possibly lost in loss record 10,316 of 10,321
==783163==    at 0x483E0F0: memalign (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==783163==    by 0x483E212: posix_memalign (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==783163==    by 0x6670776: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.3)
==783163==    by 0x66718A2: g_slice_alloc (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.3)
==783163==    by 0x664E129: g_list_prepend (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.3)
==783163==    by 0x8699073: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2404.16)
==783163==    by 0x8699109: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2404.16)
==783163==    by 0x8699109: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2404.16)
==783163==    by 0x8699109: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2404.16)
==783163==    by 0x8699109: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2404.16)
==783163==    by 0x8699109: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2404.16)
==783163==    by 0x8699109: ??? (in /usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2404.16)
==783163==
==783163== LEAK SUMMARY:
==783163==    definitely lost: 15,343 bytes in 117 blocks
==783163==    indirectly lost: 30,212 bytes in 854 blocks
==783163==      possibly lost: 1,076,572 bytes in 2,183 blocks
==783163==    still reachable: 104,252,374 bytes in 30,157 blocks
==783163==                       of which reachable via heuristic:
==783163==                         newarray           : 1,536 bytes in 16 blocks
==783163==         suppressed: 32 bytes in 1 blocks
==783163== Reachable blocks (those to which a pointer was found) are not shown.
==783163== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==783163==
==783163== Use --track-origins=yes to see where uninitialised values come from
==783163== For lists of detected and suppressed errors, rerun with: -s
==783163== ERROR SUMMARY: 65526 errors from 784 contexts (suppressed: 0 from 0)

Thanks also for all the pointers about ABI and going to python3.6 compatibility. I'd kinda be curious to look into this memory leak, actually, even though it's fixed now. Maybe some day. For now, though, just thanks--my two big issues are fixed now. (I've been working on other new features already, like animated gifs, reverse scrolling, calling an external utility on the current image list, and adding new images as they appear, not at the end. Are these anything you'd be interested in?)

mk-fg commented 3 years ago

Darn. Well, this is all the more reason for me to upgrade to Ubuntu 20.04 then. I hope it fixes it! Ok, back... and it's still happening: Could my system still be broken?

Given that it worked fine for me on blank 20.04 just fine, and should work like that, yeah, pretty sure something is different there, and not in a good way. Maybe you have double set of gtk libs and .so loads the wrong one, but when you do "import" in python, it loads something else?

You can probably check it easily by running either case with strace -f -e %file -- ... - it should show open() for each, and I'd look for where gtk libs are loaded from. Can also check ldd pixbuf_proc.so, which should show what it loads, but not entirely sure if it's accurate - strace definitely should be.

mk-fg commented 3 years ago

I've been working on other new features already, like animated gifs, reverse scrolling, calling an external utility on the current image list, and adding new images as they appear, not at the end. Are these anything you'd be interested in?

Not sure, probably depends on the feature. And at some point it might just be too much effort for a script that I barely use myself.

In general, if you are interested in adding a lot of stuff, it would probably be much easier for you to maintain your own fork obviously, which guess I should be able to check and cherry-pick stuff I like from to here anytime, without slowing you down with PRs and such unnecessarily. Not to mention that you might want to change how stuff works there in fundamental ways, and have features which I completely don't understand or care about.

Github's UI is designed around some corporate/centralized workflow with "original repository" and second-class "forks" though, so even if/when I'll completely abandon this repo, it might still end up being more discoverable than forks for other people. There are probably easy ways to work around this, but I'm not up-to-date on what are the best ones these days, maybe there's just a "I'm no longer just a fork!" button somewhere.

Flurrywinde commented 3 years ago

Darn. Well, this is all the more reason for me to upgrade to Ubuntu 20.04 then. I hope it fixes it! Ok, back... and it's still happening: Could my system still be broken?

Given that it worked fine for me on blank 20.04 just fine, and should work like that, yeah, pretty sure something is different there, and not in a good way.

Darn. And also, there does still seem to be a memory leak, just smaller. You don't have any memory leak at all? Anyway, I'll investigate with strace and let you know what I find out.

Flurrywinde commented 3 years ago

I've been working on other new features already, like animated gifs, reverse scrolling, calling an external utility on the current image list, and adding new images as they appear, not at the end. Are these anything you'd be interested in?

Not sure, probably depends on the feature. And at some point it might just be too much effort for a script that I barely use myself.

  • Animated gifs sounds... strange, given the scrolling, but I'd suspect it should just be another image type, if gtk supports them in Image widget.

I got it working, but it seems they can't be scaled. Right now, I have tiny gifs scrolling by with the properly sized jpg's. LOL

  • Reverse scrolling - guess you might mean changing direction at runtime, might be nice to have, with loading images from either end.

Yes, that's what I'd like to have. In researching this, I saw that the way you do it, iter() and next(), is more efficient but limited, so I'd have to change the data structure. Should be fine, though, I think, because efficiency isn't a factor for this part of the code?

  • Calling an external utility on the current image list - sounds unnecessary, as you can already have this list be read line-by-line from stdin, i.e. whatever-list-generator-script | image-scroller -f -, but might've misunderstand the idea.

I meant call the external utility on the currently shown images only. I like to have images scrolling by while I work, but if I see one I want to do something to, like add keywords, copy somewhere, delete, etc, it'd be nice to just hit a key and pipe that image's filename to a script that does it. So far, I'm only able to get the whole current queue, but I'd like it to just be the image that's currently most prominent on the screen.

In general, if you are interested in adding a lot of stuff, it would probably be much easier for you to maintain your own fork

Cool. I'll do it this way.

Github's UI is designed around some corporate/centralized workflow with "original repository" and second-class "forks" though, so even if/when I'll completely abandon this repo, it might still end up being more discoverable than forks for other people. There are probably easy ways to work around this, but I'm not up-to-date on what are the best ones these days, maybe there's just a "I'm no longer just a fork!" button somewhere.

I haven't heard of this, but I'll keep my eye out.

mk-fg commented 3 years ago

Darn. And also, there does still seem to be a memory leak, just smaller. You don't have any memory leak at all?

Well, not within 8 hours that I ran it, as mentioned.

Iirc it started at 300-something megs, went up to 311M within a minute or few, and after I woke up and checked on it, it still was 311-something, reloading/scrolling same 3 images from that archive at a somewhat ridiculous speed.

Pretty sure everything in that code ran crazy amount of iterations during that, so if there is a leak, it gotta be either dependent on something else running (maybe a race condition or display server), or specific to images it is loading.

Though given how relatively simple both C and python part is wrt what it's doing (regardless of all scroll-calculation magic, that's just python), at this point I'm certain that it gotta be related to something on your end - can still be buggy/weird libs (idk how these survived full update, but never really used non-rolling distros myself), something in your system that doesn't work with gtk correctly, maybe a quick of how you measure things, or some unknown unknown I guess.

If it's important enough, being in your place, I'd probably boot up 20.04 from a usb stick and leave app running overnight there to rule out that issue is indeed not happening on a clean system with that machine, and failing that, try same exact images and command as me (and make sure to use same script from the repo I guess), see if maybe I just failed to measure it somehow, or at least confirm that it's not an issue with your current OS that way.

Anyway, I'll investigate with strace and let you know what I find out.

Assuming that newer python was installed from some PPA, I'd probably check if maybe it's still installed instead of one in the repos first, but I don't really know how that might cause that linking error, which is really weird, as pretty sure even older gtk should still load its own gobject, but I haven't dealt with linking issues a lot so maybe not.

mk-fg commented 3 years ago

Yes, that's what I'd like to have. In researching this, I saw that the way you do it, iter() and next(), is more efficient but limited, so I'd have to change the data structure. Should be fine, though, I think, because efficiency isn't a factor for this part of the code?

First - yes, don't think efficiency is a factor in any part of the python code there. Or at least to a reasonable degree, where "unreasonable" would only be something like "python loop iterating over every image pixel".

Second - I can't imagine anything that you can do in these calculations that'd make them take more than an irrelevant couple dozen microseconds. I.e. no matter which variables, classes or algorithms you use there, pretty sure calculating an offset and choosing whether to load next image can't possibly be "inefficient" on a scale of displayed frames.

So aside from aesthetical preferences, readability and bugginess, don't think it matters how anything in that python script is structured, as long as it's doing roughly same thing.

mk-fg commented 3 years ago

I got it working, but it seems they can't be scaled. Right now, I have tiny gifs scrolling by with the properly sized jpg's. LOL

Can probably be solved by either spawning subprocesses to convert them with ffmpeg or embedding video-player-like gstreamer widgets instead of images, which should definitely be scalable. A bunch of extra complexity of course, and probably falls under "adding a lot stuff" which I can't meaningfully review or maintain without needing or using myself, but can definitely be done. (and absolutely anything can be done if you'll be willing to go down to C level and offload any kind of efficient/external processing to threads there, as there's just nothing more capable on a current linux system than that)

mk-fg commented 3 years ago

I meant call the external utility on the currently shown images only. So far, I'm only able to get the whole current queue, but I'd like it to just be the image that's currently most prominent on the screen.

All "Image" dataclass-objects in that queue from the already-scrolled side should have "sz" attribute with their size, so knowing e.g. that current normalized scroll position is X, you can iterate them from the queue drop-end and once sum of these sizes goes past X, you've found first image that has some displayed part.

Can then check how much of it peeks out (difference between sz sum and that X), check against window size, and add other loaded images until you find the "most prominent" one, which is probably either closest to center or with largest "sz" part that still fits within that window size.

Another approach might be to either query GTK for which widgets are currently visible or maybe subscribe to its widget signals which indicate that (like "expose" or something like that for a widget). Either way should work, and with any kind of queries or other actions, given that they are only needed on keypress, there's probably no need to even bother looking for more efficient one - anything will work fine :)

Flurrywinde commented 3 years ago

If it's important enough, being in your place, I'd probably boot up 20.04 from a usb stick and leave app running overnight

Yeah, seems like a good idea for me to do this. After upgrading to 20.04, I had to compile mpv, and now it seems to have a memory leak.

Flurrywinde commented 3 years ago

All "Image" dataclass-objects in that queue from the already-scrolled side should have "sz" attribute with their size

Thanks. This little writeup helped me a lot, and I got a rough draft working. However, sz seems to be the width (in vertical scroll mode), so is always the same, i.e. the width of the window. I had to make another property of Image to store the height.

Let me know if I should make separate issues for these suggested features. (I could even put 'em in my fork.)

mk-fg commented 3 years ago

Let me know if I should make separate issues for these suggested features. (I could even put 'em in my fork.)

Would make sense to create separate PRs for each separate feature, but again, dunno if worth the effort, as you might as well just continue development in your own fork and not bother merging new stuff back here.

mk-fg commented 3 years ago

sz seems to be the width (in vertical scroll mode)

Oh, right, my bad. Indeed, height was probably just discarded as an unnecessary value after the scroll-offset calculations (described somewhere else earlier) when setting pixbuf for new image widget or adding one.

mk-fg commented 3 years ago

Somewhat undermined my earlier statements about how I don't really plan to update anything here by thinking to fix passing a bunch of system-specific command-line arguments to this script via config file(s) and added parsing for these in 707923e.

Flurrywinde commented 3 years ago

All right, I'm back from switching my system over to Arch. Took awhile, but it was worth it. I remember now that the reason I upgraded from Ubuntu 16.04 to 18.04 was because 16.04 got almost unusably slow, so I guess whatever was messed up was still there in 18.04 and then 20.04, just not as bad. Now, wow, the computer's nice and snappy. I'd been wanting to change to Arch for a long time, so it was good for this to finally push me over the edge.

So, yup, the memory leak is gone. I ran valgrind --tool=memcheck --leak-check=full -- python infinite-image-scroller.py --debug -a 10:0.001 -l -- images/ and...

==1629082== LEAK SUMMARY:
==1629082==    definitely lost: 2,880 bytes in 7 blocks
==1629082==    indirectly lost: 15,570 bytes in 657 blocks
==1629082==      possibly lost: 101,436 bytes in 222 blocks
==1629082==    still reachable: 106,499,770 bytes in 35,120 blocks
==1629082==                       of which reachable via heuristic:
==1629082==                         length64           : 9,288 bytes in 141 blocks
==1629082==                         newarray           : 2,336 bytes in 66 blocks
==1629082==         suppressed: 0 bytes in 0 blocks

Memory usage graph using mprof run --include-children --multiprocess ./infinite-image-scroller.py -l -a 1:.01 images/: Figure_3

Finally, the undefined symbol: g_object_unref error didn't occur either when compiling pixbuf_proc.so.

Before changing to Arch, I ran Ubuntu 20.04 from a LiveUSB, and though things were better, they also seemed a little different from your experience. Compiling pixbuf_proc.so with both gcc -O2 -fpic --shared \python3-config --includes` `pkg-config --cflags gtk+-3.0` -lgtk-3 pixbuf_proc.c -o pixbuf_proc.soandgcc -O2 -fpic --shared `python3-config --includes` `pkg-config --cflags gtk+-3.0` -lgtk-3 pixbuf_proc.c -o pixbuf_proc.sostill gave theundefined symbol: g_object_unref`, so that's kinda a mystery.

However, no memory leak, so pretty much all is well.

I tried to figure out just what was wrong on my Ubuntu system, but it got to be way too much work. Arch is better anyway. :) Oh, and awesome about the config file.

mk-fg commented 3 years ago

Now, wow, the computer's nice and snappy

Dunno if Arch really should make things run faster, but it might be a side-effect of not installing/running as much stuff as ubuntu comes with, I guess. Oh, and pacman is probably second-fastest package manager behind apk from alpine.

Glad to hear that all's well, but dunno about g_object_unref either - don't think I did anything on that 20.04 that might've affected it, though it was installed version, not LiveUSB, so maybe for some reason LiveUSB has that same issue as older releases.