Guidance for algorithm implementation.

nishihatapalmer commented 1 year ago

We should define what the expected implementation of a smart algorithm looks like. Here's a few things it would be good to standardize and document. There are probably others.

Memory allocation

inside or outside pre-processing and search measurements?
note that some pre-processing requires dynamic memory allocation, e.g. linked lists and other dynamic structures.
most current algorithms allocate memory outside of pre-processing and search measurement.

Memory de-allocation

inside or outside pre-processing and search measurements?
most current algorithms perform memory de-allocation as part of the search measurement.
must ensure all memory allocated is freed (a few currently don't).

Setting initial values

most algorithms do all of this inside pre-processing.
there are a few tiny variations where some small values are set outside pre-processing for a few algorithms. I doubt this makes much practical difference to any measurement though.

No need for null terminators

patterns and text are actually not strings and can contain null byte values.
no need to set one at the end of a pattern.
no need to set one at the end of the text.

Must be able to handle arbitrary length patterns

if there are pattern lengths an algorithm cannot handle, it must test for this and return -1 (cannot search).
must otherwise allocate and de-allocate sufficient memory for the length of pattern it is searching.

Pattern validation

some algorithms use memcmp(), some have a loop, and a few don't need an explicit pattern validation step at all as it falls out of the algorithm directly.
is it worth standardizing the base case of pattern validation? e.g. have a #define that provides a completely standard validation function?
this is to attempt to make bench-marking of algorithms less dependent on arbitrary coding choices and to more reflect the relative performance of the algorithms themselves.

Error return codes

#define CANNOT_SEARCH -1 // already exists, but use the explicit name for clarity.
#define ERROR_SEARCHING -2 // use this instead of exit(1) when an unrecoverable error is detected.

nishihatapalmer commented 1 year ago

Modification of text buffer

some algorithms modify the end of the text to put a sentinel guard / fast loop in, one algorithm adds (2 * m) copies.
currently we have this additional space as #define NUM_PATTERNS_AT_END_OF_TEXT 2.
we detect buffer overflows in the test command (it allocates a bigger buffer and checks if the algorithm modifies anything past the limit).
make any limits clear to algorithm authors.

nishihatapalmer commented 1 year ago

Detection of text modification

I think I will put a text buffer modification test into the benchmark code as well as the test code, and mark algorithms that do it.

Using a sentinel guard is a general technique that almost always provides a speed up if you can avoid a position check in a fast loop.
Only some algorithm implementations use it; other's don't even if they could. This makes comparing real-world algorithm performance harder than it should be.
It is a technique that is extremely hard to actually use in the real world, unless you are in full control of the allocation of memory for search data. This is almost never the case if you are just being given data to search in.

Knowing which algorithms use this technique would be a useful data point to bear in mind when comparing performance and evaluating suitability for an application.

nishihatapalmer commented 1 year ago

Large array allocations

Algorithms that allocate a multi-dimensional array based on pattern length and alphabet size become too big to be allocated on the stack past a pattern size of about 4096. e.g. A[m][SIGMA].
Use malloc for these arrays and free any memory allocated at the end, or return -1 if the pattern length is greater than 4096.

nishihatapalmer commented 1 year ago

Obtaining good benchmark results Identify known sources of bench-marking variation external to the algorithm - e.g. process variation, cache priming, other OS tasks, performance profile / power management settings in OS or BIOS.

Quantify them where possible and give guidance on reducing the variation.

Define measures of good performance We currently just say the algorithm with the fastest mean time is the fastest, which is defensible but not always very useful. It's the one that smart marks out from the rest.

The variation in performance is also an important measure. If we have one algorithm that is a tiny bit faster than another, but has a much wider variation than the other, it is arguably not the best choice for many purposes.

We also have median statistics now, and maybe confidence intervals at some point.

nishihatapalmer commented 1 year ago

Use clear, structured programming Avoid the use of jump instructions (i.e. goto) in the algorithm code and any other coding optimizations that make the implementation unclear.

Note that certain algorithms use specific control flow structures that are not provided in the C language, for example, the while..else construct. If an algorithm requires goto into order to implement the algorithm as designed, then this is acceptable in the absence of a structured programming construct. Otherwise additional values would have to be set and tested for, and this would bias the performance of the algorithm negatively.

nishihatapalmer commented 1 year ago

Keep variants to a minimum Keep the number of variant implementations of an algorithm to a minimum. It's OK to have multiple implementations of an algorithm where necessary. For example, one for each length of q-gram it processes, as that strongly affects the performance of the hash algorithm.

Tuning For algorithms with a large parameter space, avoid publishing lots of different implementations each with slightly different parameter variants, with no guidance as to which ones should be used in which circumstances. If you know that, for example, a particular set of parameters works well on low alphabet searches, it's fine to provide those implementations separately as being tuned for that purpose.

In general Pick a good general set of default values for algorithm parameters and provide written guidance on tuning them further. The algorithm could also auto-tune itself, which would not create new implementations. Only provide variant implementations where you can clearly explain their purpose.

nishihatapalmer commented 1 year ago

Avoid use of pointer arithmetic It is possible to make search algorithms faster by using pointer arithmetic rather than indexing into the text. For example, instead of having an int search position pos and using y[pos], using an unsigned char *pos = y, incrementing that and accessing the text using *pos directly leads to faster execution time generally.

However, almost all algorithms in smart use indexing, with the exception of certain algorithms like EPSM that use SIMD intrinsics.

In order to achieve a fair comparison of the algorithms, it is recommended that pointer arithmetic is not used in the algorithms unless there is an unavoidable reason to do so.

ostafen / smart

Guidance for algorithm implementation. #54