kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.5k stars 513 forks source link

Downstream changes from Mozilla DeepSpeech to lm/interpolate/CMakeLists.txt submitting to upstream #319

Closed KathyReid closed 3 years ago

KathyReid commented 3 years ago

Hi there KenLM folks, Firstly a huge thanks for all your work on the kenlm project, and for making it open source. Thank you.

We incorporate kenlm into Mozilla DeepSpeech and have made some minor amendments to a CMakesList.txt file that tries to install eigen. We've added some checking and user assistance, and are providing this in case you'd like to pull it in upstream. I've created a patchFile (att'd) and you can also see the changes here;

https://github.com/KathyReid/kenlm/tree/patch-1

I felt an Issue would be easier to cherry-pick than a PR, but also happy to open a PR if desired.

https://github.com/mozilla/DeepSpeech/pull/3480

patchFile.txt

Kind regards, Kathy

kpu commented 3 years ago

@JackBoosY You requested an explicit Eigen flag in https://github.com/microsoft/vcpkg/pull/13692/ . This proposed change does Eigen detection. I like detection better. Any comments?

kpu commented 3 years ago

Hi @KathyReid, thanks for the code. Does Mozilla actually use the interpolation program? It doesn't seem to have many users.

KathyReid commented 3 years ago

Good question, @reuben will know more - I only picked up a 404ing link in our local use of kenlm, and he suggested to raise the PR upstream instead.

JackBoosY commented 3 years ago

@kpu @KathyReid

  1. I couldn't find the option ENABLE_INTERPOLATE, maybe it's removed. And I think that's a bad idea.
  2. It is not a good thing to automatically enable/disable features based on the results of finding dependencies, because it will destroy the user's expectations, and the user can only find this in the configuration log.
  3. Usually, we are used to using target_compile_options / add_compile_options to add C_FLAGS / CXX_FLAGS instead of adding these flags to CMAKE_C_FLAGS / CMAKE_CXX_FLAGS.

That's all my suggestions.

reuben commented 3 years ago

Sorry that patch is not what I intended to be upstreamed. @KathyReid surfaced a problem in our repo where the error message instructed users to download the eigen source code from a URL that no longer existed, but that URL is no longer included in KenLM master. We vendor a specific version of KenLM in DeepSpeech, commit b9f35777d112ce2fc10bd3986302517a16dc3883 which was before all the changes requested by @JackBoosY. So I guess there's nothing to upstream, we should just update our vendored KenLM to get rid of the invalid URL.

reuben commented 3 years ago

@KathyReid sorry for wasting your time here I should have checked upstream first to see if it still had the problem.

kpu commented 3 years ago

Indeed the incorrect URL is gone.

KathyReid commented 3 years ago

@reuben not wasting my time at all! Useful learning exercise. Thanks too @kpu for your assistance.