Closed jtrmal closed 7 years ago
Thank you, no worries! I'll squash it into two commits, if you don't mind, for C code and CMake.
I could have sworn I implemented the bit counting functions... But cannot find the implementation. MS compiler has these builtins for the same CPU instructions, only named differently.
Is memory mapping important? I do not remember it used in kaldi. Or is it actually used by ConstFst?
Sure, go ahead -- I don't mind.
Ad bitcounting -- I think where possible I used the MS counterpart. There was one function that I think didn't exist and for that I think I used Paul Dixon's implementation from openfstwin 1.4.x ( I hope I mentioned his name in the code -- if not, then I will fix that). I'm afk, so I cannot check myself, sorry. I think memory mapping is not used in kaldi y,
On Wed, Nov 1, 2017 at 10:58 PM, Kirill Katsnelson <notifications@github.com
wrote:
Thank you, no worries! I'll squash it into two commits, if you don't mind, for C code and CMake.
I could have sworn I implemented the bit counting functions... But cannot find the implementation. MS compiler has these builtins for the same CPU instructions, only named differently.
Is memory mapping important? I do not remember it used in kaldi. Or is it actually used by ConstFst?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kkm000/openfst/pull/2#issuecomment-341303934, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX5-rythrjD-UQ8oWGdNvVkBYcfOmks5syS_rgaJpZM4QOncR .
I found the bit counting builtins implementation, it was an unrelated project actually. So I'll replace it, there won't be Paul's code, so no problem with the attribution either. _BitCountReverse (slightly different prototype) and __popcnt64. I'm in the moddle of something else, will do tomorrow probably!
The CMake work is not Windows-specific, is it? Maybe we should maintain it on both Unix and Windows branches? It appears to have become quite popular recently. The msbuild files it generates are quite verbose (probably 30x more noise that there should be), but they work! :)
I have a couple of funny incantations for MSBuild to build many single-source executables in one VS project. MSBuild used to be obscure (have you seen any build system that is not?), and I think many underappreciate its power. I used it to compile complex W3C grammars with a 3rd party closed source tool, and these files included many levels of other grammars. MsBuild is even able to intercept and track actual files read and written by a compiler (when you see a *.tlog/ directory, this is the cache of actual file dependencies), to rebuild dependencies of only changed files. Luckily it's open source now, and much better documented!
__builtin_ctzl
(32 bit) implementation, just FYI: https://github.com/kkm000/mitlm/blob/c39ebfe606fe0cbbd4c53f775205573cc808a75b/src/util/BitOps.h#L43. The __builtin_ctzll
is similar, only _BitScanReverse64
for 64-bit.
I integrated your changes, splitting them into two commits. I did some edits, mostly trimming EOL whitespace where it snuck in. The biggest one is the emulation of GNU intrinsics with Microsoft ones. I do not even know if the original code compiles into 32 bit, but I made sure the port does. You can see the differences between your version and mine by clicking on the Resolve Conflicts button right below. Please let me know if you see anything suspicious! I tested the __builtin_ctzll
part by comparing Linux/gcc results with mine, they match where their behavior is defined (it's undefined for the argument equal 0).
Thanks for the CMake work! When I have some free cycles, maybe I should also add compilation of the binaries to the native VS project as well.
Here's the rub. I just imported v1.6.5.1 (I am adding their "rN" release number as another dot). They have split every binary into two files: fstxxx.cc
contains the main()
function that calls fstxxx_main()
, and fstxxx_main.cc
what used to be fstxxx.cc
with the main()
function renamed into fstxxx_main()
. Like this:
$ grep fstrandgen_main **/*.*
src/bin/fstrandgen.cc:int fstrandgen_main(int argc, char **argv);
src/bin/fstrandgen.cc:int main(int argc, char **argv) { return fstrandgen_main(argc, argv); }
src/bin/fstrandgen-main.cc:int fstrandgen_main(int argc, char **argv) {
With src/bin/fstrandgen.cc
now containing the flags only and the main()
function:
#include <unistd.h>
#include <climits>
#include <ctime>
#include <fst/flags.h>
DEFINE_int32(max_length, INT32_MAX, "Maximum path length");
DEFINE_int32(npath, 1, "Number of paths to generate");
DEFINE_int32(seed, time(nullptr) + getpid(), "Random seed");
DEFINE_string(select, "uniform",
"Selection type: one of: "
" \"uniform\", \"log_prob\" (when appropriate),"
" \"fast_log_prob\" (when appropriate)");
DEFINE_bool(weighted, false,
"Output tree weighted by path count vs. unweighted paths");
DEFINE_bool(remove_total_weight, false,
"Remove total weight when output weighted");
int fstrandgen_main(int argc, char **argv);
int main(int argc, char **argv) { return fstrandgen_main(argc, argv); }
and the other file with the rest. This is that "oh crap"moment but this probably needs some accounting for in the CMake scripts :(
NEWS snippet does not say anything:
* Prevents unreachable code generation in libfstscript (1.6.5)
* Adds move constructors for non-trivial weight types (1.6.5)
* Standardizes method names for tuple weight types (1.6.5)
* Eliminates undefined behavior in weight hashing (1.6.5)
* Optimizes binary search in SortedMatcher (1.6.5)
* Adds SetWeight (1.6.5)
Added one file: src/include/fst/set-weight.h
Look at the orig/1.6 branch.
thanks, will look into it, hopefully, it will be easy to deal with. y.
On Sat, Nov 4, 2017 at 10:05 AM, Kirill Katsnelson <notifications@github.com
wrote:
Here's the rub. I just imported v1.6.5.1 (I am adding their "rN" release number http://www.openfst.org/twiki/bin/view/FST/FstDownload as another dot). They have split every binary into two files: fstxxx.cc contains the main() function that calls fstxxx_main(), and fstxxx_main.cc what used to be fstxxx.cc with the main() function renamed into fstxxx_main(). Like this:
$ grep fstrandgen_main /. src/bin/fstrandgen.cc:int fstrandgen_main(int argc, char argv); src/bin/fstrandgen.cc:int main(int argc, char argv) { return fstrandgen_main(argc, argv); } src/bin/fstrandgen-main.cc:int fstrandgen_main(int argc, char argv) {
With src/bin/fstrandgen.cc now containing the flags only and the main() function:
include
include
include
include <fst/flags.h>
DEFINE_int32(max_length, INT32_MAX, "Maximum path length");DEFINE_int32(npath, 1, "Number of paths to generate");DEFINE_int32(seed, time(nullptr) + getpid(), "Random seed");DEFINE_string(select, "uniform", "Selection type: one of: " " \"uniform\", \"log_prob\" (when appropriate)," " \"fast_log_prob\" (when appropriate)");DEFINE_bool(weighted, false, "Output tree weighted by path count vs. unweighted paths");DEFINE_bool(remove_total_weight, false, "Remove total weight when output weighted"); int fstrandgen_main(int argc, char argv); int main(int argc, char argv) { return fstrandgen_main(argc, argv); }
and the other file with the rest. This is that "oh crap"moment but this probably needs some accounting for in the CMake scripts :(
NEWS snippet does not say anything:
* Prevents unreachable code generation in libfstscript (1.6.5) * Adds move constructors for non-trivial weight types (1.6.5) * Standardizes method names for tuple weight types (1.6.5) * Eliminates undefined behavior in weight hashing (1.6.5) * Optimizes binary search in SortedMatcher (1.6.5) * Adds SetWeight (1.6.5)
Added one file: src/include/fst/set-weight.h
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kkm000/openfst/pull/2#issuecomment-341899368, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX7leg2APFMPxXDXN4fnJ19zOzNf8ks5szG8NgaJpZM4QOncR .
I merged the changes locally, but not yet pushing them here, so you still can review the "conflicts" which are the differences between your changes an mine. Please just let me know when/if you have any comments, and I'll push the updated win/1.6 branch from where you can go on.
I think you can merge and I will deal with it :) y.
On Sat, Nov 4, 2017 at 10:34 AM, Kirill Katsnelson <notifications@github.com
wrote:
I merged the changes locally, but not yet pushing them here, so you still can review the "conflicts" which are the differences between your changes an mine. Please just let me know when/if you have any comments, and I'll push the updated win/1.6 branch from where you can go on.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kkm000/openfst/pull/2#issuecomment-341901358, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX4sRR6UjpyGcD27qrG8t3vT_5Lscks5szHYGgaJpZM4QOncR .
There you go! Thanks! Just open a new PR when you have the CMakefile changes?
yep, will do. y.
On Sat, Nov 4, 2017 at 10:42 AM, Kirill Katsnelson <notifications@github.com
wrote:
There you go! Thanks! Just open a new PR when you have the CMakefile changes?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kkm000/openfst/pull/2#issuecomment-341901917, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX_gt_11_3QFf6EedViZn-rBXT9P3ks5szHfVgaJpZM4QOncR .
These are my changes. Sorry, it's not split into more logical PR (could do next week).