X-Plane crashes in `TEX_obj::load_texture_data()` with `fuse.threading=True` (Arch/Manjaro)

jonaseberle commented 1 year ago

First of all thank you for that project.

Describe the bug

As soon as I use fuse.threading=True (even with maxwait=1.0), I get texture artifacts like this even for areas that have been fully cached, for example when I just loaded into a flight there.

When flying, AutoOrtho loads tiles, but after a relatively short time (usually < 20NM) X-Plane 12 chokes on some texture data and crashes.

The X-plane backtrace shows:

#0  __memcpy_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:486
#1  0x0000000000f45685 in TEX_obj::load_texture_data(TEX_obj_load_data const&, std::__1::function<void (TEX_obj_load_result const&)>&&) ()
#2  0x0000000000f42d90 in TEX_obj::do_load(unsigned int, bool, UTL_continuable*) ()
#3  0x0000000000f5604e in lambda_wrapper<TEX_obj::queue_load_async(bool, UTL_continuable*)::$_1>::resume(resume_base*) ()
#4  0x0000000000cf08a0 in UTL_threadpool::thread_func(void*) ()
#5  0x00000000005398b7 in THREAD_launch(void*) ()

(Actually these crashes were my problem since months, I just now discovered that they only happen with fuse.threading=True).

To Reproduce

Steps to reproduce the behavior:

Setup 0.5.0-2 on Manjaro Linux
Leave settings at default (fuse.threading=True)
Run via the released bin, start X-Plane 12.05r1
Move plane. The faster you go, the quicker it crashes for me. I have an X-Plane crash after 1-4 min when I go in a straight line at 3000ft with 300kts.

Expected behavior

I expect some tiles to not load, some blurriness or even green fallback tiles. But I receive a crash.

(please complete the following information)

OS: Linux (Arch/Manjaro, Kernel 6.3/6.1/5.15)
XPlane Version: 12.05r1 (no Zink)
AutoOrtho Version: main, 0.5.0-2, 0.4.3 (run via released bin, via self-built bin, via Python)
Filesystem type: ext4

Additional context

The .jpg files in cache are all valid and all are 256 x 256px (jpeginfo --check).

I suspected an architecture problem with the pre-built STB/ISPC texture utils so as a test I built them myself from source https://github.com/bensnell/stb_dxt https://github.com/GameTechDev/ISPCTextureCompressor (https://github.com/kubilus1/autoortho/compare/main...jonaseberle:autoortho:dev/self-compiled-shared-objects)

no change

I tried pydds.compressor=ISPC with pydds.format=BC1 and BC3 and pydds.compressor=STB.

no change for the crashes (but STB has a general problem? See below)

I changed the cache code so that always the same cache file was delivered for a given ZL in trying to narrow it down. (https://github.com/kubilus1/autoortho/compare/main...jonaseberle:autoortho:dev/dds-tests)

no change

I suspected unflushed write() due to non-closed files (https://github.com/kubilus1/autoortho/compare/main...jonaseberle:autoortho:dev/close-files)

no change

I've run tests (cd autoortho && pytest .) and all 45 complete successfully after a little change (https://github.com/kubilus1/autoortho/compare/main...jonaseberle:autoortho:dev/use-refuse)

configuration: .autoortho Please tell me if logs of a certain combination of settings would be interesting.

Side note: STB always produces these kind of textures for me:

Actually I am happy to have it stable again with threading=False, but I am posting my findings now anyways. With maxwait=1.0 X-Plane hangs (around 20-30s) each 5-15min @400kts@FL410 while AutoOrtho is heavily downloading slippy tiles and that's not fit for VATSIM so I drastically lowered maxwait to 0.1 - that's still not perfect but it leads to less and shorter hangs, although it still happened during approach where even 10s stand-still are too much. It would be great if we could prioritize performance over any beauty on demand. Let me know if I can help to make multithreading more stable.

kubilus1 commented 1 year ago

Thanks for the detailed report. That's interesting phenomenon, regarding threading and incorrect tiles. I'll try to replicate when I can.

As far as STB, I'll likely drop that support soon.

I suspected unflushed write() due to non-closed files (https://github.com/kubilus1/autoortho/compare/main...jonaseberle:autoortho:dev/close-files)

Not sure. The 'with' statement indicates that the file handle is in a context manager. This implies a 'finally' closing things, even in the case of error.

From the first picture, it's somewhat difficult for me to tell if this is a case where old data is being returned, or possibly, portions of things are shifted a tile or two over.

kubilus1 commented 1 year ago

Some trickier areas of code in the project involve the 'partial' compression, and the hand off to the aoimage module. That along with areas where reads happen at boundaries between mipmaps.

For instance, when reads occur, first the header is read. The header is 128bytes, however the block that is read will be larger. Modern OSs don't simply consistently read 32k or whatever, this can vary. We don't want to just assume that a header+extra read means we want to pull in a full image (that would be the largest mipmap to boot!) but also we can't assume that this is just a header read.

Afterall, that extra info after the 128byte header will likely be cached deep in the OS somewhere. So now we really have to go ahead and pull in the data for that random extra amount. And also we need to go ahead and convert that to DDS files. Oh and DDS conversion has to be on certain boundaries as well.

It would be much much simpler to just pull down the entire mipmap0 and convert the whole thing as soon as we see and access, but from the log lines, the ratio of very far low res tiles to mipmap0 full res tiles is around 1000/1.

Anyhow, so if there is some kind of uncleared data, or maybe two threads contending here, I'd suspect that area to be a prime target.

jonaseberle commented 1 year ago

Thank you.

The 'with' statement indicates that the file handle is in a context manager.

Face palm :) I am not very literate with Python.

Thanks for giving more insight into what you found out about how these DDSs are accessed. What you do there with dynamic DDSs is amazing and complicated by nature. Since it relies on how X-Plane does things I would so wish that the X-Plane developers would approach you to try making it easier for AutoOrtho.

Modern OSs don't simply consistently read 32k or whatever

We should be able to see that in the log. The first access to a .dds looks like this here:

DEBUG:autoortho_fuse:READ: /textures/24976_16400_BI16.dds 0 131072 0
DEBUG:autoortho_fuse:READ: DDS file /textures/24976_16400_BI16.dds, offset 0, length 131072 (('24976', '16400', 'BI', '16'))

So that's 128kB with my system.

I guess you found out that for X-Plane only the 128 bytes of header are interesting on the first read. I am not sure: Is AO returning 128 bytes or is it adding padding to the requested size?

Just to verify (also against potential X-Plane changes - I believe in 12.03 X-Plane's texture loader code changed because some scenery authors were reporting problems) maybe a debug setting in AO if a read beyond 128 bytes should return the final DDS could be helpful.

Regarding file system caching: fuse's FOPEN_DIRECT_IO https://docs.kernel.org/filesystems/fuse-io.html should be able to disable that. I can't find how that could be used in refuse. But does that look like refuse uses that mode by default already?

kubilus1 commented 1 year ago

Yeah if a read starts from 0, you can assume a header read is occurring (that's always bytes), but the read request will be longer. So effectively at that point I have to assume the rest of the data will be needed, and I process that much data and return that with the read.

At the point of that read request, it's entirely possible there will be another read request for the next block. Or not. Impossible to tell.

As far as how much data is read, it's pretty consistent on Linux but can be system dependent, but not so much on Windows. Some of the read patterns there are just .... odd. I did try directio at one point but had some issues. Possibly worth a retry.

You can play around with fuse options here if you want: https://github.com/kubilus1/autoortho/blob/main/autoortho/autoortho_fuse.py#L452

kubilus1 commented 1 year ago

I'm trying to replicate this, but can't seem to. I'm on Ubuntu 22.04, running the latest X-Plane 12. i7-8550U CPU, Radeon 5600XT GPU.

Flying at 16X ground speed, which I would assume would be fast enough to show an issue.

So FUSE threading, what that allows is multiple simultaneous reads to be processed. So leaving that disabled seems reasonable. Maybe your CPU has many more cores than mine?

kubilus1 commented 1 year ago

Face palm :) I am not very literate with Python.

lol. No worries. It's a somewhat vague construct.

So another thing to perhaps experiment with would be to play around with thread locking. For the 'getortho.py' module I have a decorator setup. (another weird python feature!). This is basically a syntax for a function wrapper.

Try adding @locked right above some key methods. https://github.com/kubilus1/autoortho/blob/main/autoortho/getortho.py#L632 would be a good candidate to start.

    @locked
    def read_dds_bytes(self, offset, length):
        log.debug(f"READ DDS BYTES: {offset} {length}")

        if offset > 0 and offset < self.lowest_offset:

That method is the ingress point for reads from the FUSE side of things.

Another one that could be at play is this: https://github.com/kubilus1/autoortho/blob/main/autoortho/getortho.py#L803

Which basically handles the decision of finding the best possible tile in the case of a chunk fetch taking too long. Since you report issues during high load times, that's suspicious.

Thanks again for the detailed feedback.

kubilus1 commented 1 year ago

Also, when you run from python directly, what version are you using?

jonaseberle commented 1 year ago

Also, when you run from python directly, what version are you using?

Usually main, but have been trying the tagged commits as well.

kubilus1 commented 1 year ago

Sorry, I meant what version of python are you using? python3 --version. I'm using 3.10.6 typically.

jonaseberle commented 1 year ago

Ah :) I use Python 3.11.3

jonaseberle commented 1 year ago

In first tests yesterday the FUSE(direct_io=True) and your suggested @locked decorators did not make any difference and kept producing the SEGV in TEX_obj::load_texture_data(), albeit noticeably less frequently.

I've read up on threading.RLock() and I think I get the gist so my less frequent crashes might well come from less concurrency.

Naively sprinkling @locked everywhere locks up AutoOrtho, though :)

Some ideas: Maybe this lock https://github.com/kubilus1/autoortho/blob/main/autoortho/getortho.py#L1069 should cover the whole loop.

That del() in TileCacher.clean() https://github.com/kubilus1/autoortho/blob/main/autoortho/getortho.py#L1078 could still mean that the Tile continues to live, couldn't it? If so it could happen that we get a duplicate Tile instance created in https://github.com/kubilus1/autoortho/blob/main/autoortho/getortho.py#L1101C15-L1101C15. Maybe it could reuse TileCacher._close_tile() which seems safer? https://github.com/kubilus1/autoortho/blob/main/autoortho/getortho.py#L1128

I use an AMD 5800X3D (16 virtual cores) and an Nvidia RTX 3060 (12GB).

Here's another example of these duplicated chunks in the DDS: 20230705_000120 (these squares are rather unusual, the stripes from the first screenshot are more common) (ISCP, BC1, threading=True)

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

kubilus1 commented 11 months ago

This should be resolved by #409 Well the duplicate tile stuff, anyhow

jonaseberle commented 11 months ago

Indeed! Thank you, @kubilus1!

kubilus1 / autoortho