Rework strict path matching

natecraddock commented 1 year ago

Strict path matching was implemented from a request (https://github.com/natecraddock/zf/issues/11) and I misunderstood the initial request.

As implemented in version 0.7.0 I do think it is currently useful. But there is room for improvement as suggested in later replies in that issue. Current implementation: when a / is found the path matching is made strict to the right and requires contiguous paths in the query.

There are two good points raised in the issue

Examples that currently fail

Given these files:
app/models/foo/bar/baz.rb
app/models/foo/bar-baz.rb
app/models/foo-bar-baz.rb
app/monsters/dungeon/foo/bar/baz.rb
I would expect that typing model/barbaz would filter down to:
app/models/foo/bar-baz.rb
app/models/foo-bar-baz.rb
Because, for a search input of model/barbaz, I'm expecting that barbaz should match only within one level of depth in the dir tree. foo/bar/baz.rb has "bar/baz" split across two levels.

a model of "locking" and fuzziness

It's like each slash should "lock" a portion of the path to the left, but still leave the portion to the right for fuzzier matching until the next slash and so on...
Search:
> dung/bar/
  | 1 | 2 |  3 (fuzzy)...

Match:
app/monsters/dungeon/foo/bar/baz.rb
|        1          |   2   |  3 (fuzzy)...
                    ^       ^
                 "lock"  "lock"

I've thought about this for a while and I believe I have a good solution

Solution

Enable strict path matching when a query token contains a / anywhere.
Strict path matching is defined as: each segment of a query must match within a single path segment, but the segments are not required to be contigous
the already existing multi-token nature of zf enables the suggested model of "locking" and fuzziness

Here are some examples, all using our lovely test strings

app/models/foo/bar/baz.rb
app/models/foo/bar-baz.rb
app/models/foo-bar-baz.rb
app/monsters/dungeon/foo/bar/baz.rb

Example 1

query: app/mod/fo/ba/baz
       |1 |2  |3 |4 |5  |

app/models/foo/bar/baz.rb
|1 |2     |3  |4  |5     | (pass)

app/monsters/dungeon/foo/bar/baz.rb
|1 |        (fail: 'mod' not found in single segment)

Example 2

query: model/barbaz
       | 1  | 2   |

app/models/foo/bar-baz.rb
   | 1    |   | 2       | (match)

app/models/foo-bar-baz.rb
   | 1    |    |2       | (match)

app/models/foo/bar/baz.rb
   | 1    |       (fail: 'barbaz' not found in single segment)

Example 3

query: mod/ foo
       |1 | |1'|

(matches the first 3 test strings)

app/monsters/dungeon/foo/bar/baz.rb)
            (fail: 'mod' not found in single segment)

With those examples in mind, I think this covers all the possible use cases, and makes things quite useful!

To recap: A query token containing any / will be used for strict path matching. The path segments are not required to be contiguous.

Currently a query of mod/ would match both mo/d/ and /mod/, but with these changes only /mod/ would be selected.

Open questions

It's like each slash should "lock" a portion of the path to the left, but still leave the portion to the right for fuzzier matching until the next slash and so on...

I like this idea, but zf already natively supports this fuzziness by inserting a space and adding a new query token. I'm not against adding this, but I think adding a space keeps things more consistent.

Pistos commented 1 year ago

Based on all your examples, I think that is exactly what I want.

Thank you for giving this feature request so much thought and attention. I look forward to using it once it's implemented. I already have a use in mind, in another open source project. I'll show you when integration is done.

natecraddock commented 1 year ago

Thanks for reviewing! I have set the deadline for version 0.8.0 for April 1, but I think I can get this feature done well before then. I'll share here once I have it working. I already think I know the changes needed and it shouldn't be too complex actually.

natecraddock commented 1 year ago

I think I have everything working! See the rework-strict-path-match branch for the latest changes. Please don't feel rushed or obligated, but I would appreciate some feedback :)

The behavior is now

if a query token contains a / that token will be used for strict path matching
the path segments of the token (separated by / characters) all must be found in path segments of the candidate string
the segments are no longer required to be contiguous

In other words, exactly as described above.

For @ratfactor's idea to leave the right part of the token "fuzzy", I didn't implement this for a few reasons.

it would have complicated the code to special-case the last path segment
adding a new token (with a space) easily allows for the fuzzy behavior
Example 1 in the description of this issue would match an additional undesired path

I'm still open to this "fuzzy" right path if we can find a way to do it. But I'm not sure if it's possible. And I think adding a space is a good enough solution.

ratfactor commented 1 year ago

I did my best to break it (I even enlisted my wife, who is really good at breaking software). Here's my attempt at an increasingly pathological test:

$ cat paths.txt
oatmeal/sugar.txt
oatmeal/sugar_txt
oatmeal/sug/ar
oat/meal
oat/meal/sug/ar
oat/meal/sug/ar/sugar
oat/meal/sug/ar/sugar.txt
oat/meal/sug/ar/sugar.js
oatmeal/sugar/sugar.txt
oatmeal/sugar/snakes.txt
oatmeal/sugar/skeletons.txt
oatmeal/brown_sugar.txt
oatmeal/brown_sugar/brown.js
oatmeal/brown_sugar/sugar.js
oatmeal/brown_sugar/brown_sugar.js
oatmeal/brown_sugar/sugar_brown.js
oatmeal/granulated_sugar.txt
oatmeal/raisins/sugar.js
oatmeal/raisins/helicopter/sugar.js
oatmeal/raised/sugar.js
raised sugar helicopter
sugar raised helicopter
helicopter sugar raised
helicopter raised sugar
raised helicopter sugar
sugar helicopter raised
oat/rise/js
oat/rise/sug.js
Helicopter Sugar.js
oats raised suggestively jam slowly
oats raised heliocentric slowly grape
oats resist helicopter signs
oatmeal BrOwn SUGar
JavaScript text .js .txt
JavaScript text .js .txt sugar
oat sugar.js
oat.txt sugar.js
oat.txt /sugar.js
oat.txt /sugar.js /sugar.js
//////oat.txt /sugar.js /sugar.js
//////oat.txt
///../..//oat.txt
///../..//oat.txt/sugar.js
///../..//oat.txt/sug ar.js
///../..//oat.txt/ sug ar.js
/ //// /oat.txt/ sug ar.js
/ //// /oat.txt/ sug ar .js
oat/
oat/ meal
oat /meal /sug/
oat /meal/ sug/ar//
1234567890/oat

:+1: I agree that zf's existing support for separate fuzzy tokens does what I need. There's no actual need for a "right/left" matching like I was picturing. This already does what I would expect.

As long as all portions of a path are true, the match is good regardless of the order I enter them. The results of this weird search are fine:

> oat/sugar sugar/sugar meal/sugar
oatmeal/sugar/sugar.txt
oatmeal/brown_sugar/sugar.js
oatmeal/brown_sugar/brown_sugar.js

And if I want to enforce the right/left order of the path portions, I just need to keep the path together.

With a space:

> oat/sugar brown
oatmeal/brown_sugar/brown.js
oatmeal/brown_sugar/brown_sugar.js
oatmeal/brown_sugar.txt
oatmeal/brown_sugar/sugar.js
oatmeal/brown_sugar/sugar_brown.js

No space:

> oat/sugar/brown
oatmeal/brown_sugar/brown.js
oatmeal/brown_sugar/brown_sugar.js
oatmeal/brown_sugar/sugar_brown.js

The matching of the last part, "brown," is still fuzzy, but the order is enforced. That's all I need!

A cool bonus feature that is also possibly unrelated from this path matching specifically, but something I noticed while testing: A search token consisting of some number of slashes is a way to request a minimum path depth.

Example: The three slashes here filtered out any paths that weren't at least three directories deep:

> oat/sugar ///
//////oat.txt /sugar.js /sugar.js
///../..//oat.txt/sugar.js
///../..//oat.txt/sug ar.js
/ //// /oat.txt/ sug ar.js
/ //// /oat.txt/ sug ar .js
///../..//oat.txt/ sug ar.js

I thought that was neat.

In all, it works great and I wasn't able to break it. As before, I'm really excited about this feature and I've already switched to the new version for personal use. :tada:

Pistos commented 1 year ago

This is good! But, and I apologize :sweat_smile:, it's not quite where I'd like it to be.

Given that this path exists:

app/models/foo/bar/baz.rb

It doesn't match with this search string:

> mod/baz.rb
./app/models/foo-bar-baz.rb
./app/models/foo/bar-baz.rb

My thinking is: I know I have a model named baz, so I type mod for the models tree, and baz.rb to get at the baz model written in Ruby, which is at some unknown depth within the models/ tree.

natecraddock commented 1 year ago

@Pistos this really should work. I haven't looked at the code yet, but I think I know where this bug is coming from. I'll try to fix it soon

ratfactor commented 1 year ago

How did I not run into this? I am so bad at testing software. :rofl:

Pistos commented 1 year ago

@natecraddock Yes, I had a feeling this is a bug rather than intentional design. As a clue, my first guess would be that there's an edge case having to do with a full segment match (bar.rb).

Some more data:

> mod/ba
./app/models/foo-bar-baz.rb
./app/models/foo/bar-baz.rb
./app/models/foo/bar/baz.rb

which is highlighting the ba of /bar. So maybe you have some code taking baz character by character, and, when it processes ba, it finds /bar, and then when a z is added, it thinks ba+z doesn't match ba+r, and so it filters out that whole subtree.

natecraddock commented 1 year ago

I was right, it was a simple bug to fix! I have fixed it (and pushed to the branch) and now ./app/models/foo/bar/baz.rb matches mod/baz.rb

Something I noticed that is related, but a separate issue I'll address later

> mod/baz.rb
./app/models/foo-bar-baz.rb
./app/models/foo/bar-baz.rb
./app/models/foo/bar/baz.rb

In this case I think we should expect mod/baz.rb to rank ./app/models/foo/bar/baz.rb the hightest because baz.rb has the greatest path segment coverage on that string. I'll adjust the ranking for this later, but at least now the line does match.

natecraddock commented 1 year ago

@ratfactor thanks for your thorough testing. I missed this bug myself, so no worries!

Pistos commented 1 year ago

I'll test some more later, but it seems all good so far! Good work. :+1:

I have some more feature requests, but I'll make them in separate github issues.

natecraddock commented 1 year ago

In this case I think we should expect mod/baz.rb to rank ./app/models/foo/bar/baz.rb the hightest because baz.rb has the greatest path segment coverage on that string. I'll adjust the ranking for this later, but at least now the line does match.

Issue to track this: https://github.com/natecraddock/zf/issues/26

I'll test some more later, but it seems all good so far!

I'll go ahead and merge the brach, but I'll leave the issue open for the next couple days

natecraddock commented 1 year ago

I added a bunch of tests inspired by the test cases in this issue. It should (hopefully) prevent any regressions in the future.

natecraddock / zf