massivebird / arcsearch

Digital game archive querying tool
GNU General Public License v3.0
0 stars 1 forks source link

System paths with a slash don't work in config file #11

Closed stelcodes closed 9 months ago

stelcodes commented 10 months ago

A path like path: "foo/bar" for a system doesn't return any results for any query.

massivebird commented 9 months ago

Note to self: arcstat works as intended; this issue is specific to arcsearch

massivebird commented 9 months ago

I wrote a solution that is extremely slow with large archives. I guess the solution itself isn't slow, but it necessitates a change that affects performance massively. Feel free to improve this code and make a PR :3

Resolving this issue

Changing the following lines in lib.rs technically resolves this issue:

// "snes"
let base_dir = relative_pathname[..relative_pathname.rfind('/').unwrap_or(0)].to_string();

// ...

if system.games_are_directories && entry.path().is_file() ||
!system.games_are_directories && entry.path().is_dir()
{
    continue;
}

base_dir: change find to rfind conditional: add new complementary condition

The new problem (from Hell)

However, since all files in a directory are navigated lexicographically regardless of type, navigating through a system directory could be momentarily interrupted when finding a valid system subdirectory. So output could look like this:

[ GBA ] Coolest Game
[ GBA ] Coolest Games Plural
[ GBA_NESTED ] Another Game
[ GBA_NESTED ] Another Game Rising: Revengeance
[ GBA ] Coolest Game Singular
[ GBA ] Worst Game Ever

Personally, I hate this ;_;

Solving this new issue is where performance shits itself. Introducing this sort_by call inside the iterator builder has two side effects, one of which solves the above:

let walk_through_archive = || {
    WalkDir::new(&config.archive_root)
        // 1) iterates through regular files before subdirectories, and
        // 2) iterates through both types lexicographically
        .sort_by(|a, b| a.file_name().cmp(b.file_name()))
        .into_iter()
        // silently skip errorful entries
        .filter_map(Result::ok)
        .filter(|e| is_not_bios_dir(e) && is_valid_system_dir(e))
};

However, this sort operation is extremely expensive and slows down the entire for loop.

Solutions

The new strat must be either:

  1. Solve the primary issue in a different manner
  2. Solve the subsystem interruptions in a performant way

I think that option 1 is a great idea! I noticed that arcstat walks each system path independently rather than walking the entire archive. This refactor would be beneficial beyond the scope of this issue.

TLDR

Solving this issue is simple, but the strategy spawned a new problem that I can't solve without sacrificing performance 😭 but hope lies in a refactor!

massivebird commented 9 months ago

Resolved as of 0.2.2 ( :3 ) :tada:

I did have to include a warning in the README:

While it is possible to place system directories multiple levels below the archive root (such as in root/systems/consoles/ps2), I do not recommend nesting system directories. This may generate undesirable results.