Potentially slow object loading due to DirectX Object Parser

Kenny-Hui commented 1 month ago

Description

When a object file is loaded, OpenBVE calls CanLoadObject on all object plugin to see which object loader accepts the file. In commit https://github.com/leezer3/OpenBVE/commit/f9172d7bacd6cf8e98e76b20fbad9ac43981db9a, the DirectX Object Loader now also tries to parse files even if the file header does not match the DirectX format, to see if it references the actual X file.

However the OBJ format uses slash (/) as a delimiter. This means in this line: https://github.com/leezer3/OpenBVE/blob/1b91f84aef1e7b120b7745e6aa3dd85bf7440428/source/Plugins/Object.DirectX/Plugin.cs#L79 potentialPath would contain the entirety of the OBJ file content, and including it is over 19000 slashes (/) (In the example file attached here).

Path.CombineFile will then try to split the file content with the path delimiter (Which contains the slash), and attempt to combine ~19000 path before giving up.

Because the DirectX Loader gives up after 2 path recursion, it means OpenBVE has to try close to ~40000 paths before DirectX Object Loader finishes/fails parsing, and for the OBJ loader to load the file, which dramatically increases the loading time.

Reproduction

If the issue occurs in multiple routes/ trains, please provide one or two samples. In order to reproduce the issue and debug it, it's helpful to have the following:

Object

base.zip

Related information

Operating system: Arch Linux
Method of control: Keyboard

leezer3 commented 1 month ago

It's not actually anywhere near that bad.

One of the items checked in Path.ContainsInvalidChars is the newline character, e.g. \n This means your object fails the check at character 19, and gets nowhere near the X parser or that Path.CombineFile call.

FWIW, I did consider a length check on the loaded file, but potentially UTF / SHIFT_JIS encoded filenames means this figure could be wildly different. (Consider a UTF-16 encoded file with a 256+ character non ASCII filename)

Kenny-Hui commented 1 month ago

Seems to have slipped through

I don't think I can find anywhere that checks for the newline character, does this object works under Windows?

Kenny-Hui commented 1 month ago

Yup it's a platform-specific bug https://github.com/mono/mono/blob/0f53e9e151d92944cacab3e24ac359410c606df6/mcs/class/corlib/System.IO/Path.cs#L578 The newline character is only considered "invalid" if running on Windows

leezer3 commented 1 month ago

OK, that's an 'interesting' little quirk of Mono.

I think the sensible thing to do here is to just use our own copy of the Windows path characters. I can't think of any sensible legitimate reasons for anyone to be using any of those in filenames.

Kenny-Hui commented 1 month ago

From what I can see, it uses nested loop to match against the filter, which essentially only returns true if a && b. Combining with mono's System.IO.Path.GetInvalidFileNameChars, it still won't match against the windows character sets. https://github.com/leezer3/OpenBVE/blob/c91527f2641a536cb65f3675ed50b847956163cb/source/OpenBveApi/System/Path.cs#L274-L286

leezer3 commented 1 month ago

Yeah, you're right.....

Looking further, I'm not at all convinced of the value of that function (It's an original.....), as it's first checking our own list of invalid characters at the top (I'm presuming it's intended as an early-exit), then duplicating it again with the loops over the system ones.

For the minute, I've also converted them to HashSets and simplilfied the loop a bit, but I might also get rid of the early-exit code above, as I'd strongly suspect this is probably nearly as costly as the second check. With that said, we're into micro-optimisations here, and there are far bigger things to worry about in the codebase.

Kenny-Hui commented 1 month ago

Works fine with the latest nightly build now, thanks!

leezer3 / OpenBVE