Closed dmalkr closed 4 years ago
@dmalkr @mgajda thank you, this is very impressive! For the record, which GHC versions have you tested this with?
@dmalkr @mgajda thank you, this is very impressive! For the record, which GHC versions have you tested this with?
Mainly with GHC 8.6.5; in other versions (8.*) we just check that tests and benchmarks still working.
Hi @dmalkr @mgajda , hope you're doing great! Shall we finish this PR and publish a new release before handover?
OK, merging and cutting a new release.
There is almost all our changes.
src/Xeno/SAX.hs
. I find out that most used function iss_index
: https://github.com/ocramz/xeno/blob/a5f942d823ca6a16adbb319c43f3b131c4c1d19d/src/Xeno/SAX.hs#L252-L257 This function check for boundary crossing ofn
. But code analyze show thatn
is always non-negative, so we can remove first check.s_index
used to check current or next to current character and always used to compare with some expectable character, i.e. in modes_index inputString i == someExpectableChar
. So if we place meaninless char (for example,\NUL
) at the end of string, we can remove second check (n >= S.length ps
). Our benchmark shows speed increasing (40-150%) because of this check removing.\NUL
and read file to this buffer. For this I introduceStringLike
class, andStringLike ByteString
instance and special data type for zero-terminated stringsByteStringZeroTerminated
andStringLike ByteStringZeroTerminated
instance.process
function now processStringLike
instances.benchmark.md
: https://github.com/ocramz/xeno/blob/dc02b6fc5320df6618e7d01987b0243760f64abb/benchmark.mdprocess
function it was introduced new data typeProcess
: https://github.com/ocramz/xeno/blob/dc02b6fc5320df6618e7d01987b0243760f64abb/src/Xeno/SAX.hs#L69-L77 . This change don't affect speed/memory according to benchmarks.src/Xeno/DOM.hs
. Main improvement here is using adoptable internal array growing: https://github.com/ocramz/xeno/blob/dc02b6fc5320df6618e7d01987b0243760f64abb/src/Xeno/DOM.hs#L140-L150 . We try predict final size to grows internal vector only once. Other improvements is use unsafe vector operations, because we know vector boundaries.Xeno/DOM.hs
moved toXeno/DOM/Internal.hs
.src/Xeno/DOM/Robust.hs
variant of DOM for more robust DOM processing.ByteStringZeroTerminated
: benchmarks run with originalByteString
(input
) and with strings to wich\NUL
appended (inputz
).bench/SpeedBigFiles.hs
). We use some files from open sources (dumps from MediaWiki project). They must be downloaded before run benchmarks usingdata/download-ex-data-for-benchmarks.sh
script. It downloads data todata/ex
directory.