kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.24k stars 5.32k forks source link

OpenFst no longer produces a fst/types.h file since 1.8 #4712

Closed auroraanon38 closed 2 years ago

auroraanon38 commented 2 years ago

According to the 1.8 news, OpenFst no longer produces types.h shims, which I have verified to be true.

This prevents building kaldi with a recent OpenFst version, namely 1.8.2, with errors like this:

In file included from ../base/kaldi-error.h:34,
                 from ../base/kaldi-common.h:35,
                 from ../matrix/matrix-common.h:26,
                 from ../matrix/packed-matrix.h:25,
                 from ../matrix/tp-matrix.h:26,
                 from tp-matrix.cc:21:
../base/kaldi-types.h:44:10: fatal error: fst/types.h: No such file or directory
   44 | #include <fst/types.h>
      |          ^~~~~~~~~~~~~
compilation terminated.

I have also verified that 1.7.9 does produce such a file.

auroraanon38 commented 2 years ago

I have begun some work on trying to bring the code to build with openfst 1.8, though some things need discussion with this project's main contributors & maintainers. Changes in function signatures and naming schemes happened.

kkm000 commented 2 years ago

@auroraanon38, please be very careful with this. There was a long discussion and, IIRC, a decision to upgrade to an 1.7.x version breaking the public API in a minor update. Search issues/tickets. I can help you, but a few hours later.

What is the motivation for the upgrade? Will it give us a better functionality or a significant performance boost? I'm just asking, I do not know.

1.8 is quite a major change, and you mentioned that C++ is not your first language. FST is heavily templated API, and you will need to hack the code that consumes it.

kkm000 commented 2 years ago

Here are must-read references related to the issue, possibly not in the best order. There are multiple issues that must be taken care of.

It's generally a good idea to search issues and PRs, open or closed, before committing to a chunk of work.

auroraanon38 commented 2 years ago

kkm000 @.***> writes:

@auroraanon38, please be very careful with this. There was a long discussion and, IIRC, a decision to upgrade to an 1.7.x version breaking the public API in a minor update. Search issues/tickets. I can help you, but a few hours later.

What is the motivation for the upgrade? Will it give us a better functionality or a significant performance boost? I'm just asking, I do not know.

I lack a compelling motivation from the point of view of the Kaldi project.

Effectively, Kaldi is a transitive dependency for my project which is intended to rely on Vosk for offline speech recognition. Kaldi hadn't been building for a year in my intended toolchain (Guix), so I figured I'd fix that.

I have since repaired the build dependent on OpenFst 1.7.3 in said toolchain.

I then decided to see if I could get it to build with the more recent versions of its dependencies, perhaps even following upstream releases.

1.8 is quite a major change, and you mentioned that C++ is not your first language. FST is heavily templated API, and you will need to hack the code that consumes it.

Indeed, I have since realized that it wouldn't be a simple matter of correcting a few minor renames and that the structure has been modified to a non-trivial degree.

Compounded with OpenFst's lack of available version-controlled copy with commit notes that could help explain changes, I'm left struggling to actually fix anything.

Unfortunately, this issue is also redundant, it's a subset of the issues & pull requests you referenced.

kkm000 commented 2 years ago

I lack a compelling motivation from the point of view of the Kaldi project.

Most motivation is from outside the project. Generally, we try to accommodate all reasonable requests, and this is a primary driver for changes. I think you do have one. Kaldi as a library is fully supported, and you see that this issue is coming up once in a while.

A big deal here is not so much internal changes but the breakage of the binaries API, used in scripts. I do not remember exactly, but at least one binary lost a switch (or have it renamed). This could potentially break a lot of scripts. We need to work around the issue, too. Changing the egs is simple, but there are many more scripts out there.

Another thing is that the latest OpenFST requires C++17. This is OK, we're compatible with C++17 and even C++20, thanks @jtrmal, but we cannot yet require C++17. This means the default should currently be the latest C++14-compatible one, and we must support both versions.

Let me put it higher on my list. Unfortunately, the key players shuffled between jobs and countries during the last two years, so there is a backlog. I'm planning to spend much more time on it—I've all but abandoned even maintenance, except on minor occasions, as time allowed.

Compounded with OpenFst's lack of available version-controlled copy

You can diff release history and compare revisions on the original branch here: https://github.com/kkm000/openfst/tree/original back to v1.6.1. Every original release is on this branch, and is tagged as orig/x.x.x.r, where r is the revision number, which is now always 1, but has not always been in their Wiki. The content is unmodified, except added maintainer's files. NB the branch is not the default one.

kkm000 commented 2 years ago

I have since repaired the build dependent on OpenFst 1.7.3 in said toolchain.

So can you use it, even with a temporary patch, or cannot at all? Does Vosk require 1.8?

auroraanon38 commented 2 years ago

kkm000 @.***> writes:

I have since repaired the build dependent on OpenFst 1.7.3 in said toolchain.

So can you use it, even with a temporary patch, or cannot at all? Does Vosk require 1.8?

I had temporarily paused efforts on Vosk while trying to update Kaldi, but have since returned to it.

I cannot yet confirm whether it requires a newer version or not, its build system will need some work on my part so I can actually build and test it with Guix.

I will keep you updated.

auroraanon38 commented 2 years ago

So can you use it, even with a temporary patch, or cannot at all? Does Vosk require 1.8?

I had temporarily paused efforts on Vosk while trying to update Kaldi, but have since returned to it.

I cannot yet confirm whether it requires a newer version or not, its build system will need some work on my part so I can actually build and test it with Guix.

I will keep you updated.

Well that wasn't too long, I can now tell you for sure that nothing needs to be done further on upstream Kaldi for Vosk.

They decided to fork both openfst and kaldi and only support those versions.

I'll have to use its versions exclusively.

kkm000 commented 2 years ago

Glad you can keep going!

A question: what is special about Guix? You mention a toolchain, but from my quick googling, it looks more like a Linux distro. Does it support only one toolchain? My main workhorse is Debian, and I'm usually using GCC that comes with it, two or three Clang versions, and used to use Intel icc, although they moved to a modified Clang, too, which is no different for Kaldi than the LLVM's baseline Clang (it can offload computation to GPU or FPGA, but this is not something we're using. But Debian does not do well with multiple GCC installed at once. Something like this? And what is the toolchain?

auroraanon38 commented 2 years ago

kkm000 @.***> writes:

Glad you can keep going!

Yeah, at least I have a fairly clear idea of the way forward, with it only having one supported way.

A question: what is special about Guix? You mention a toolchain, but from my quick googling, it looks more like a Linux distro. Does it support only one toolchain? My main workhorse is Debian, and I'm usually using GCC that comes with it, two or three Clang versions, and used to use Intel icc, although they moved to a modified Clang, too, which is no different for Kaldi than the LLVM's baseline Clang (it can offload computation to GPU or FPGA, but this is not something we're using. But Debian does not do well with multiple GCC installed at once. Something like this? And what is the toolchain?

Guix is first and foremost a source-based functional package manager written in (Guile) Scheme, with the explicit goal of facilitating reproducible builds (among other things).

It supports a number of different build systems, as you would expect of any package manager. You can, in fact, just make a package definition invoking things manually in Scheme for packages that just lack a common build system, one currently supported by Guix (though you're invited to add such support) or a build system at all.

Especially relevant for you would be the multiple versions of programs you use, as Guix supports this. Packages are addressed via hashes of their inputs & definition, so that different definitions that are properly packaged for Guix will never conflict nor interfere with eachother. This involves a bit of setup so that various parts can find their dependencies (this usually involves some working around quirks of individual projects' build systems and that's most of the difficulty in packaging new projects or debugging them - this is where my toolchain difficulties mostly came in).

For setting up to work with various versions you would just open a shell in an ad-hoc profile with the right programs and dependencies via guix shell (previously guix environment).

GNU Guix System is the distro built around Guix, with its own init/service daemon Shepherd. However, Guix (and Nix) actively supports use on Foreign Distros which is how I tend to use it.

Guix introduction

Guix was inspired by Nix.

kkm000 commented 2 years ago

@auroraanon38, thanks for the detailed explanation, that's an interesting approach. From anything I know, build is hard. There are many configure+make systems: automake, CMake, Bazel, MSBuild, you name it. All of them have major drawbacks. Since build is inherently top-down, declarative and immutable, using a functional language probably makes sense.

I'm closing this as a duplicate then.

X-Ref: #4096 X-Ref: #4533 X-Ref: #4565