nishihatapalmer / byteseek

A Java library for byte pattern matching and searching
BSD 3-Clause "New" or "Revised" License
40 stars 11 forks source link

Silent replacement of algorithms or throw an error? #18

Open nishihatapalmer opened 5 years ago

nishihatapalmer commented 5 years ago

Some search algorithms have minimum length constraints on what they can search for (e.g. algorithms that work on q-grams (multiple bytes) in a pattern).

Currently, byteseek 2.1 uses the idea of a fallback searcher - if you instantiate an algorithm that doesn't support a pattern of length 2 say, it reverts back to a simpler algorithm that is usually faster for short patterns.

But this means that the algorithm you asked for is silently replaced (even if by something better).

The alternative is to throw an IllegalArgumentException if a pattern is passed in which doesn't meet the minimum pattern length requirements. This at least guarantees that you always get what you asked for - but sometimes what you ask for isn't viable and you are told that's the case.

In reality, most programmers shouldn't generally directly select the algorithm to use (as there are multiple reasons why particular algorithms should be preferred in different circumstances, including the length of the pattern and the complexity of it and where the complexity exists in the pattern). So searcher factories should be used to obtain a search algorithm, and the factories can embody this logic.

nishihatapalmer commented 5 years ago

Another factor is that using fallback searching makes the code more complicated (even if encapsulated by a single abstract base class).

nishihatapalmer commented 5 years ago

Most of the rest of byteseek won't allow the construction of something that can't work. It always throws illegalArgumentExceptions. The idea is just that if you can construct it, it works.

The fallback searcher breaks that notion in hard to diagnose ways.

nishihatapalmer commented 5 years ago

I think I'm coming round to IllegalArgumentExceptions and ditching fallback searchers. If you can construct something, you have something that works is a good principle

Making something seem to work when it doesn't certainly makes using Byteseek "easier", as in, not having to think about using it. But it is disguising that what you asked for won't work - and silently getting something else is not necessarily a good idea.

Programmers are not, in general, experts in search algorithms. Selecting the right search algorithm should be left to experts, or code that embodies that expertise. This shields non-experts from making bad decisions, and educates people who have more sophisticated requirements what they need to consider.

nishihatapalmer commented 5 years ago

If we ditch fallback searchers, there needs to be good documentation on the limitations of the search algorithms - even if just to point people towards using the factories rather than directly instantiating them themselves.