Open felipelouza opened 1 month ago
$ g++ --version
g++ (GCC) 14.1.1 20240701 (Red Hat 14.1.1-7)
Also, when I try `make benchmark':
make[3]: *** No rule to make target 'benchmark/src/external/tlx/lib/libtlx.a', needed by 'benchmark/src/benchmark'. Stop. make[2]: *** [CMakeFiles/Makefile2:388: benchmark/src/CMakeFiles/benchmark.dir/all] Error 2 make[1]: *** [CMakeFiles/Makefile2:395: benchmark/src/CMakeFiles/benchmark.dir/rule] Error 2 make: *** [Makefile:254: benchmark] Error 2
Can you send me a complete log (everything after cloning the repo)? The missing xss::get seems to be an artifact from a previous version of the project; I'll look into it on Friday. For tlx, does make fetch_tlx succeed?
Cheers!
Hi,
The first attempt produced this output: https://gist.github.com/felipelouza/ccd128e3b2ad59b211590f27434e1920
Then I changed the lines:
#include <tree/...>
to
#include <xss/tree/...>
In the files /benchmark/test/test_stack.cpp
, /benchmark/test/util/check_array.hpp
and /nearest-smaller-suffixes/benchmark/test/util/check_tree.hpp
And, I got this error: https://gist.github.com/felipelouza/4e24d48524b641de9fe1e1a652058bf8
The make fetch_tlx
worked.
The thing I would like to do it to run the Lyndon array construction algorithm.
Thank you!
'make check' should be fine now; for the tlx issue, can you send me the log of 'make fetch_tlx' from a clean build directory (i.e., only 'cmake ..' and 'make fetch_tlx')?
Looks like this for me: https://gist.github.com/jonas-ellert/674860a2be1040a7e841abb9513c31e9
In the meantime, I have created a branch 'single-header' with a self-contained implementation of the array construction algorithm, which should be easy to integrate into other projects. I can produce a similar header for the tree construction if needed. Hope this helps!
Thank you Jonas,
make check
was successful, but make fetch_tlx
failed. You can see the details here: https://gist.github.com/felipelouza/4d0fa4a1fe306de08313349a0994a03d.
If I want to build the Lyndon Array, should I write my own code like the example here: https://github.com/jonas-ellert/nearest-smaller-suffixes-mwe/blob/master/main_plain.cpp, or is there another method available?
Cheers!
The tlx issue should be resolved on master branch, please check and let me know if this is true (clean build required).
Easiest way to build LA is to include this from branch single-header:
https://github.com/jonas-ellert/nearest-smaller-suffixes/blob/single-header/include/lyndon_array.hpp
And then use like here:
https://github.com/jonas-ellert/nearest-smaller-suffixes/blob/single-header/example.cpp
Sentinel characters at the beginning and end of the string are needed!
It worked perfectly: [100%] Built target fetch_tlx
Thank you, @jonas-ellert!
Hi Jonas,
I have one more question. I executed the LA-plain (https://github.com/felipelouza/nearest-smaller-suffixes/blob/master/main.cpp#L45), but it didn't achieve O(1) workspace as expected. Is that correct?
Thanks!
You don't need two vectors if you only want the Lyndon array (rather than PSS + Lyndon), see here:
https://gist.github.com/jonas-ellert/441d61b8bd6c7e7e4316f3ff0339fb3d
The implementation uses O(1) space ;)
Do you have any idea why it didn't work for https://pizzachili.dcc.uchile.cl/repcorpus/real/coreutils.gz (unziped)??
$ ./la-plain ~/dataset/pizza/coreutils
Segmentation fault (core dumped)
The same happened with: einstein.de.txt einstein.en.txt english.001.2 kernel world_leaders :(
This is likely because of missing sentinels. They need to be smaller than all characters present in the string; particularly, coreutils contains both std::numeric_limits
https://github.com/jonas-ellert/nearest-smaller-suffixes/blob/master/benchmark/include/file_util.hpp
The algorithm can work without sentinels, but then a slightly more complicated implementation is needed (checking the special corner cases if and only if they actually arise). My best guess is that this may slow down the algorithm by around 10% or so (there is no change in cache behavior, so the impact should be minor). Either way, for a fair comparison of algorithms, it should either be assumed that none of the algorithms need any standardization of input, or all of them get the input in their respectively required format ;)
I see. I tried a simple scan over the text increasing by one every symbol smaller than 255 (leaving the '0' for the sentinels), but the code crashed as well.
Thank you anyway, you helped a lot. The algorithm is really fast, we are trying to reduce the workpace to compute LA (with a semi-external memory approach). Also, the memory we measured for LA-plain was not always about 5n. For some inputs it needed about 6n bytes per symbol (for example: cere, fib41, influenza), any idea why?
All the best!
Hi Jonas,
I had the following problems when running
make check
:/home/felipelouza/nearest-smaller-suffixes/benchmark/test/test_xss.cpp: In lambda function: /home/felipelouza/nearest-smaller-suffixes/benchmark/test/test_xss.cpp:73:31: error: ‘get’ is not a member of ‘xss’ 73 | auto pss_and_nss = xss::get<xss::PSS, xss::NSS>(t.data(), t.size()); | ^~~
.../usr/include/c++/14/bits/stl_pair.h:1250:5: note: ‘std::get’ /home/felipelouza/nearest-smaller-suffixes/benchmark/test/test_xss.cpp:73:40: error: ‘PSS’ is not a member of ‘xss’ 73 | auto pss_and_nss = xss::get<xss::PSS, xss::NSS>(t.data(), t.size()); | ^~~ /home/felipelouza/nearest-smaller-suffixes/benchmark/test/test_xss.cpp:74:34: error: ‘get’ is not a member of ‘xss’ 74 | auto pss_and_lyndon = xss::get<xss::PSS, xss::LYNDON>(t.data(), t.size()); | ^~~
Do you have any hints on what might be happening?
Thank you!