kkm000 / openfst

Port of the OpenFST library to Windows
http://www.openfst.org/
Other
69 stars 38 forks source link

Win fixes #2

Closed jtrmal closed 7 years ago

jtrmal commented 7 years ago

These are my changes. Sorry, it's not split into more logical PR (could do next week).

kkm000 commented 7 years ago

Thank you, no worries! I'll squash it into two commits, if you don't mind, for C code and CMake.

I could have sworn I implemented the bit counting functions... But cannot find the implementation. MS compiler has these builtins for the same CPU instructions, only named differently.

Is memory mapping important? I do not remember it used in kaldi. Or is it actually used by ConstFst?

jtrmal commented 7 years ago

Sure, go ahead -- I don't mind.

Ad bitcounting -- I think where possible I used the MS counterpart. There was one function that I think didn't exist and for that I think I used Paul Dixon's implementation from openfstwin 1.4.x ( I hope I mentioned his name in the code -- if not, then I will fix that). I'm afk, so I cannot check myself, sorry. I think memory mapping is not used in kaldi y,

On Wed, Nov 1, 2017 at 10:58 PM, Kirill Katsnelson <notifications@github.com

wrote:

Thank you, no worries! I'll squash it into two commits, if you don't mind, for C code and CMake.

I could have sworn I implemented the bit counting functions... But cannot find the implementation. MS compiler has these builtins for the same CPU instructions, only named differently.

Is memory mapping important? I do not remember it used in kaldi. Or is it actually used by ConstFst?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kkm000/openfst/pull/2#issuecomment-341303934, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX5-rythrjD-UQ8oWGdNvVkBYcfOmks5syS_rgaJpZM4QOncR .

kkm000 commented 7 years ago

I found the bit counting builtins implementation, it was an unrelated project actually. So I'll replace it, there won't be Paul's code, so no problem with the attribution either. _BitCountReverse (slightly different prototype) and __popcnt64. I'm in the moddle of something else, will do tomorrow probably!

The CMake work is not Windows-specific, is it? Maybe we should maintain it on both Unix and Windows branches? It appears to have become quite popular recently. The msbuild files it generates are quite verbose (probably 30x more noise that there should be), but they work! :)

I have a couple of funny incantations for MSBuild to build many single-source executables in one VS project. MSBuild used to be obscure (have you seen any build system that is not?), and I think many underappreciate its power. I used it to compile complex W3C grammars with a 3rd party closed source tool, and these files included many levels of other grammars. MsBuild is even able to intercept and track actual files read and written by a compiler (when you see a *.tlog/ directory, this is the cache of actual file dependencies), to rebuild dependencies of only changed files. Luckily it's open source now, and much better documented!

kkm000 commented 7 years ago

__builtin_ctzl (32 bit) implementation, just FYI: https://github.com/kkm000/mitlm/blob/c39ebfe606fe0cbbd4c53f775205573cc808a75b/src/util/BitOps.h#L43. The __builtin_ctzll is similar, only _BitScanReverse64 for 64-bit.

kkm000 commented 7 years ago

I integrated your changes, splitting them into two commits. I did some edits, mostly trimming EOL whitespace where it snuck in. The biggest one is the emulation of GNU intrinsics with Microsoft ones. I do not even know if the original code compiles into 32 bit, but I made sure the port does. You can see the differences between your version and mine by clicking on the Resolve Conflicts button right below. Please let me know if you see anything suspicious! I tested the __builtin_ctzll part by comparing Linux/gcc results with mine, they match where their behavior is defined (it's undefined for the argument equal 0).

Thanks for the CMake work! When I have some free cycles, maybe I should also add compilation of the binaries to the native VS project as well.

kkm000 commented 7 years ago

Here's the rub. I just imported v1.6.5.1 (I am adding their "rN" release number as another dot). They have split every binary into two files: fstxxx.cc contains the main() function that calls fstxxx_main(), and fstxxx_main.cc what used to be fstxxx.cc with the main() function renamed into fstxxx_main(). Like this:

$ grep fstrandgen_main **/*.*
src/bin/fstrandgen.cc:int fstrandgen_main(int argc, char **argv);
src/bin/fstrandgen.cc:int main(int argc, char **argv) { return fstrandgen_main(argc, argv); }
src/bin/fstrandgen-main.cc:int fstrandgen_main(int argc, char **argv) {

With src/bin/fstrandgen.cc now containing the flags only and the main() function:

#include <unistd.h>

#include <climits>
#include <ctime>

#include <fst/flags.h>

DEFINE_int32(max_length, INT32_MAX, "Maximum path length");
DEFINE_int32(npath, 1, "Number of paths to generate");
DEFINE_int32(seed, time(nullptr) + getpid(), "Random seed");
DEFINE_string(select, "uniform",
              "Selection type: one of: "
              " \"uniform\", \"log_prob\" (when appropriate),"
              " \"fast_log_prob\" (when appropriate)");
DEFINE_bool(weighted, false,
            "Output tree weighted by path count vs. unweighted paths");
DEFINE_bool(remove_total_weight, false,
            "Remove total weight when output weighted");

int fstrandgen_main(int argc, char **argv);

int main(int argc, char **argv) { return fstrandgen_main(argc, argv); }

and the other file with the rest. This is that "oh crap"moment but this probably needs some accounting for in the CMake scripts :(

NEWS snippet does not say anything:

    * Prevents unreachable code generation in libfstscript (1.6.5)
    * Adds move constructors for non-trivial weight types (1.6.5)
    * Standardizes method names for tuple weight types (1.6.5)
    * Eliminates undefined behavior in weight hashing (1.6.5)
    * Optimizes binary search in SortedMatcher (1.6.5)
    * Adds SetWeight (1.6.5)

Added one file: src/include/fst/set-weight.h

Look at the orig/1.6 branch.

jtrmal commented 7 years ago

thanks, will look into it, hopefully, it will be easy to deal with. y.

On Sat, Nov 4, 2017 at 10:05 AM, Kirill Katsnelson <notifications@github.com

wrote:

Here's the rub. I just imported v1.6.5.1 (I am adding their "rN" release number http://www.openfst.org/twiki/bin/view/FST/FstDownload as another dot). They have split every binary into two files: fstxxx.cc contains the main() function that calls fstxxx_main(), and fstxxx_main.cc what used to be fstxxx.cc with the main() function renamed into fstxxx_main(). Like this:

$ grep fstrandgen_main /. src/bin/fstrandgen.cc:int fstrandgen_main(int argc, char argv); src/bin/fstrandgen.cc:int main(int argc, char argv) { return fstrandgen_main(argc, argv); } src/bin/fstrandgen-main.cc:int fstrandgen_main(int argc, char argv) {

With src/bin/fstrandgen.cc now containing the flags only and the main() function:

include

include

include

include <fst/flags.h>

DEFINE_int32(max_length, INT32_MAX, "Maximum path length");DEFINE_int32(npath, 1, "Number of paths to generate");DEFINE_int32(seed, time(nullptr) + getpid(), "Random seed");DEFINE_string(select, "uniform", "Selection type: one of: " " \"uniform\", \"log_prob\" (when appropriate)," " \"fast_log_prob\" (when appropriate)");DEFINE_bool(weighted, false, "Output tree weighted by path count vs. unweighted paths");DEFINE_bool(remove_total_weight, false, "Remove total weight when output weighted"); int fstrandgen_main(int argc, char argv); int main(int argc, char argv) { return fstrandgen_main(argc, argv); }

and the other file with the rest. This is that "oh crap"moment but this probably needs some accounting for in the CMake scripts :(

NEWS snippet does not say anything:

* Prevents unreachable code generation in libfstscript (1.6.5)
* Adds move constructors for non-trivial weight types (1.6.5)
* Standardizes method names for tuple weight types (1.6.5)
* Eliminates undefined behavior in weight hashing (1.6.5)
* Optimizes binary search in SortedMatcher (1.6.5)
* Adds SetWeight (1.6.5)

Added one file: src/include/fst/set-weight.h

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kkm000/openfst/pull/2#issuecomment-341899368, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX7leg2APFMPxXDXN4fnJ19zOzNf8ks5szG8NgaJpZM4QOncR .

kkm000 commented 7 years ago

I merged the changes locally, but not yet pushing them here, so you still can review the "conflicts" which are the differences between your changes an mine. Please just let me know when/if you have any comments, and I'll push the updated win/1.6 branch from where you can go on.

jtrmal commented 7 years ago

I think you can merge and I will deal with it :) y.

On Sat, Nov 4, 2017 at 10:34 AM, Kirill Katsnelson <notifications@github.com

wrote:

I merged the changes locally, but not yet pushing them here, so you still can review the "conflicts" which are the differences between your changes an mine. Please just let me know when/if you have any comments, and I'll push the updated win/1.6 branch from where you can go on.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kkm000/openfst/pull/2#issuecomment-341901358, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX4sRR6UjpyGcD27qrG8t3vT_5Lscks5szHYGgaJpZM4QOncR .

kkm000 commented 7 years ago

There you go! Thanks! Just open a new PR when you have the CMakefile changes?

jtrmal commented 7 years ago

yep, will do. y.

On Sat, Nov 4, 2017 at 10:42 AM, Kirill Katsnelson <notifications@github.com

wrote:

There you go! Thanks! Just open a new PR when you have the CMakefile changes?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kkm000/openfst/pull/2#issuecomment-341901917, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX_gt_11_3QFf6EedViZn-rBXT9P3ks5szHfVgaJpZM4QOncR .