Benchmarks comparing with other DFA-Regex-Engines?

hyperpape / needle

Compiling string matching algorithms and regular expressions to java bytecode

MIT License

56 stars 2 forks source link

Benchmarks comparing with other DFA-Regex-Engines? #10

Open almondtools opened 2 years ago

almondtools commented 2 years ago

I have written a regex benchmark comparing different regex engines for Java. Lately I found your approach and would be curious how it performs compared to the other alternatives:

You can run the benchmark on your own
If your project was available as artifact in a maven repository I would offer to extend regexbenchmark by your project and start a new benchmark.

hyperpape commented 2 years ago

Thanks for the note. I'll have a look at your benchmarks, and keep them in mind.

Right now, I have a few things that I think need to be addressed before I push this to maven, and cut a 0.1 release.

hyperpape commented 1 year ago

@almondtools I was looking at the benchmarks--are there any scripts for handling the output?

almondtools commented 1 year ago

I am not certain to understand ... I would suggest that you implement a triple

A benchmark extends MatcherBenchmark
An automaton implements Automaton which is referenced in the benchmark (an which is a wrapper of your algorithm)
A test extends MatcherBenchmarkTest

The tests search a pattern in a sample and compare the number of found results with a reference implementation. It is not checked whether all results are found at the correct location. I think the large test corpus (of the scaling benchmarks) prevents that a benchmark passes with pure luck.

Does it help you?

hyperpape commented 1 year ago

Sorry, my earlier question was a bit vague.

Yes, I was able to implement those in a branch I have locally, and doing so helped me find two bugs in needle.

However, when I run the tests, it seems to give mostly unstructured output to the console. Is there a good technique for turning that data into a table or other format that's good for analysis so I can easily compare my library to others? I didn't know if I missed something in your repo that does that, or if there's a nicer way than reading the results and extracting data by hand.

almondtools commented 1 year ago

Probably you found the files *bench*.cmd. They write the benchmark data to csv and text output (examples are attached), Unfortunately I did not develop tools to analyze or visualize the benchmark results. I did this for stringbench, but it was much effort and is probably not easy to reuse.

I also noticed that the benchmarks will have to be adjusted for other versions of java/jmh, hopefully you have solved this already.

result.csv result.txt

hyperpape commented 1 year ago

Whoops, my apologies. I overlooked the command files.