More lexers - Githubissues

lodo1995 / experimental.xml

A replacement of Phobos std.xml

https://lodo1995.github.io/experimental.xml

Boost Software License 1.0

20 stars 8 forks source link

More lexers #12

Closed lodo1995 closed 8 years ago

lodo1995 commented 8 years ago

While the SliceLexer is quite fast, it requires the entire input to be loaded in memory beforehands. On the other side, the current RangeLexer is painfully slow.

It may be useful to add a ForwardLexer, which requires its input to be at least a ForwardRange, using this information to speed up the reading process, in particular the allocation of memory.

It's also necessary to add a BufferedLexer, which takes as input an InputRange of slices. It will be very useful for buffered reads from files, having a speed comparable to the SliceLexer whenever the token is not on a buffer boundary, but not needing a huge amount of memory.

burner commented 8 years ago

sounds like a plan. As usual with (sort of) manual management you have to make sure you do not escape references to freed or overwritten buffers. But before you write the "BufferedRangeLexer" have your continuous benchmarking in place (unless you have done that already, and I missed that) so you can track the difference.

lodo1995 commented 8 years ago

Although not 100% complete, the benchmarking code is there. Just tweak some parameters at the beginning of random_benchmark.d, run make clean-random-benchmark, then make random-benchmark > results.txt. This way you can save the results on a file for comparison with future benchmarks. The results file also contains the configuration parameters, so that you know what you tweaked in random_benchmark.d.

burner commented 8 years ago

The thing is, you have to do that by hand. And then you have to do the comparison by hand. I think it would be really helpful if you had graphs showing you all that. Have a look at https://github.com/dlang/phobos/pull/2995 http://code.dlang.org/packages/std_benchmark . I think you already have all the data, you just need to dump it in a way that something like gnuplot can handle.

lodo1995 commented 8 years ago

Understood. Code to produce a CSV should be an almost one-liner. I'll also make a script to feed gnuplot. This discussion belongs to issue #10 .

Hackerpilot commented 8 years ago

Ping me if you want any extra help/advice on speeding up lexers.

burner commented 8 years ago

@Hackerpilot thanks

lodo1995 commented 8 years ago

@Hackerpilot thank you very much. I'll upload some work on this as soon as possible.

lodo1995 commented 8 years ago

@Hackerpilot @burner I implemented ForwardLexer and BufferedLexer. In terms of performance, BufferedLexer is asymptotically equal to RangeLexer for very small buffers and asymptotically equal to SliceLexer for very big ones (as expected). ForwardLexer didn't brought the expected performance gain with respect to RangeLexer, being only slightly faster.

burner commented 8 years ago

well I would say that is a good result. If you made a mistake somewhere you at least made it three times ;-)

burner commented 8 years ago

When[1] you have some graphs please share them.

[1] when the csv export gnuplot import is done

lodo1995 commented 8 years ago

Here is the first meaningful graph, comparing the performance of the various lexers, with files of increasing sizes. I'll upload the code to generate the graph soon.

plot

lodo1995 commented 8 years ago

For the time being, I'm quite happy with the lexers, so I'm closing this issue.