ArchLinux PKGBUILDs for native client and python bindings

stes commented 6 years ago

I created (unofficial) PKGBUILD files for Arch Linux, which can be downloaded here:

If desired to include such files in the main repository or at the Arch User Repository, I am happy to submit a pull request.

lissyx commented 6 years ago

@AtosNicoS I know nothing about ArchLinux, but I'd like to give it a try on TaskCluster, how should I be using your PKGBUILD file to produce some installable package ?

AtosNicoS commented 6 years ago

First you need a basic Arch Linux installation. I hope your taskcluster provides such an image!? Otherwise you have to read the wiki on how to install an Arch Linux system. I could also provide you with some installer scripts, but this is not advices.

Then you need to install the general development environment:

sudo pacman -S base-devel devtools --needed

And also edit/create your makepkg config to use multiple processors:

nano ~/makepkg.conf
---------------------------
MAKEFLAGS="-j$(nproc)"

Then download the PKGBUILD and the patch in the same folder and run extra-x86_64-build. It will compile the code and provide you with some package. Install it with sudo pacman -U <name>.pkg.tar.xz. I hope that helps.

lissyx commented 6 years ago

Thanks @AtosNicoS ! We can use any DockerHub image, so I can use https://hub.docker.com/r/base/archlinux/ :-).

lissyx commented 6 years ago

@AtosNicoS I'm unable to install devtools package: https://taskcluster-artifacts.net/PCVDMDu8RkeznwJAPNY6yA/0/public/logs/live_backing.log

i'm using the archimg/base-devel:2018.04.01 Docker image: https://github.com/archimg/archlinux/blob/master/Dockerfiles/basement/Dockerfile.base-devel

And the payload:

 pacman --noconfirm -Syyu && pacman --noconfirm -S --needed git devtoools && adduser --system --home /home/build-user build-user && cd /home/build-user/ && echo -e \"#!/bin/bash\\nset -xe\\n env && id && git clone --quiet https://github.com/lissyx/DeepSpeech.git ~/DeepSpeech/ds/ && cd ~/DeepSpeech/ds && git checkout --quiet 9bc2d682e5cb17d46b79feeaa1c3515bdb6b5d3d\" > /tmp/clone.sh && chmod +x /tmp/clone.sh && sudo -H -u build-user /bin/bash /tmp/clone.sh && true && sudo -H -u build-user --preserve-env /bin/bash /home/build-user/DeepSpeech/ds/packages/archlinux/build.sh && sudo -H -u build-user /bin/bash /home/build-user/DeepSpeech/ds/packages/archlinux/package.sh\n

AtosNicoS commented 6 years ago

You used 3 'o' devtoools, but it is devtools. (it was my fault, i've edited my post above now)

Also there is no need to create a builduser etc. Devtools will handel everything for you and build you a clean package in a chroot only with the specified dependencies in the PKGBUILD.

I have not used such docker images before, but to me it sounds wrong to build packages different than what I suggested. The base-devel image is a correct choice though :)

lissyx commented 6 years ago

@AtosNicoS Good catch. What's wrong with the way I'm building it? I'm using the steps you documented :)

lissyx commented 6 years ago

@AtosNicoS Okay, it's a dead-end: Docker won't allow mount: https://tools.taskcluster.net/groups/CqfVCTZIT3usOxv3oNOGqQ/tasks/CqfVCTZIT3usOxv3oNOGqQ/runs/0/logs/public%2Flogs%2Flive.log#L857

AtosNicoS commented 6 years ago

Let me comment your "payload" in multiple lines

pacman --noconfirm -Syyu
pacman --noconfirm -S --needed git devtools

# Okay I missunderstood why you added a new user, but it might be correct, as you should not start devtools (extra-x64build) as root.
adduser --system --home /home/build-user build-user
cd /home/build-user/

# What is this used for? Debugging?
echo -e \"#!/bin/bash\\nset -xe\\n env
id

git clone --quiet https://github.com/lissyx/DeepSpeech.git ~/DeepSpeech/ds/
cd ~/DeepSpeech/ds

# I am not sure what this command does. It does a checkout but redirects it into a script!? Why are you dping this?
git checkout --quiet 9bc2d682e5cb17d46b79feeaa1c3515bdb6b5d3d\" > /tmp/clone.sh
chmod +x /tmp/clone.sh
sudo -H -u build-user /bin/bash /tmp/clone.sh

# What is the sense of this true command?
true

# Link to your scripts: https://github.com/mozilla/DeepSpeech/commit/9bc2d682e5cb17d46b79feeaa1c3515bdb6b5d3d
# Those look good so far.
sudo -H -u build-user --preserve-env /bin/bash /home/build-user/DeepSpeech/ds/packages/archlinux/build.sh
sudo -H -u build-user /bin/bash /home/build-user/DeepSpeech/ds/packages/archlinux/package.sh\n

So it should work.

Regarding your 2nd comment: Hm that's a pity. You could try to run makepkg directly. This is not advices for building clean packages to distribute though. It could work if you always reset the docker image (I am not sure how you handle this, do you destroy it everytime?). However as a quick'ndirty test you could try to run makepkg -sri instead of extra_x64-build.

lissyx commented 6 years ago

@AtosNicoS What you call "non-sense" and others are just leftover because I re-used the template from some other build, it's a quick test :). The docker image should be clean, there is no re-use over each run. I'll give a try to makepkg -sri then :)

lissyx commented 6 years ago

@AtosNicoS Okay after some extra trial / error, I had to make a few changes to the base system and also to your PKGBUILD, but this run should start to build stuff properly: https://tools.taskcluster.net/groups/Jb_S9OnrSjWLQwBOi4Ht4g/tasks/VL_FC8BxSn-1_h6Na2NCTg/runs/0

lissyx commented 6 years ago

@AtosNicoS It built successfully :) https://queue.taskcluster.net/v1/task/VL_FC8BxSn-1_h6Na2NCTg/runs/0/artifacts/public/deepspeech-v0.1.1.r67.gae146d0-1-x86_64.pkg.tar.xz

I tend to think the pkgver variable should be removed in favor of the pkgver function is that right ?

AtosNicoS commented 6 years ago

Great news!

The pkgver variable gets replaced and then updated by the pkgver function. In our case it is more or less useless, as you always reset the docker container. I think it is still mandatory to specify a pkgver variable. Just keep it as it is, or set it to the latest version where you changes the PKGBUILD.

Have you tried the compiled binary? It works, but it is still super slow, even on my i7. It uses only a single CPU core to calculate the result.

2018-04-25 08:14:09.091573: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

lissyx commented 6 years ago

@AtosNicoS I have not tried that, but since we just pass -O3 it's not surprising :-). I'd like to have pkgver really match the current git tag, should I be generating the PKGBUILD file to handle that or can we do something with the pkgver function ?

Would you be willing to take ownership of that to have PKGBUILD landed ? Or should you just do a PR against @stes's repo ?

AtosNicoS commented 6 years ago

Note: I am the same person as @NicoHood , just at work, so you dont get confused. I am doing this mainly at work but i am also interested to use and package it for ArchLinux in my free time.

I have not tried that, but since we just pass -O3 it's not surprising :-).

What is not surprising? That it is not using multiple cores?

I'd like to have pkgver really match the current git tag, should I be generating the PKGBUILD file to handle that or can we do something with the pkgver function ?

When you build the package the pkgver() function gets called and automatically updates the pkgver variable of the PKGBUILD. However if you delete the docker container afterwards it gets discarded. It does not really matter what version the pkgver variable has, as it gets replaced anyways.

Beside this, this package is a deepspeech-git package actually. This is only for testing purposes, to test the latest master branch. If you really want to package deepspeech you build against a fixed tag. Thatswhy I was requesting prereleases. In this case the pkgver is fixed (no function) and download a .tar file from the github tag and builds that. Everyone else should be able to reproduce this package then, as -git packages change too fast and might break.

Would you be willing to take ownership of that to have PKGBUILD landed ? Or should you just do a PR against @stes's repo ?

We will use the proper way of distributing packages, the AUR (Arch Linux User Repository). It will be those two packages: https://aur.archlinux.org/packages/deepspeech/ https://aur.archlinux.org/packages/deepspeech-git/

The first one would normally be based on 0.1.1 . However I was unable to build that for archlinux, that's why I am waiting for the next version. In the meantime we can use the -git package and propose the changes to the AUR maintainer "onny".

The most important aspects for me are now:

Use multiple cores/speed up recognition
Get a package building with a new release (no -git version)

lissyx commented 6 years ago

@AtosNicoS I figured you were the same one, just at work :). Regarding the speed, threading depends on TensorFlow, and it has two levels of threading: intra-op and inter-op, so it deeply depends on the exact op and implementations. I've verified on all our builds, and we do leverage multiple cores, but not over the whole process. So I guess once we verify if -lpthread is actually correctly passed in your build and we enable more optimizations, we should see it kicking-in :).

Regarding the versions, I know, I'm about to work on that, the PSU issues seems to be solved. But still, it might be useful to have the deepspeech-git one. I even plan on following what TensorFlow seems to be doing now, that is, to push alpha / rc versions to repos (PyPI/NPM), so that people can more easily grab them.

AtosNicoS commented 6 years ago

But where would you add -lpthread? Could you modify the PKGBUILD is the way you are building deepspeech? This information is missing in the Readme and I have no idea how to optimize the builds/build it the same way as you do.

lissyx commented 6 years ago

@AtosNicoS In fact, -lpthread should already be added by the tensorflow build files. But you can try and force --linkopt=-lpthread for example, on the bazel build command line. It's not documented, because we had no issue on our side, so I'm just trying to guess while helping you.

lissyx commented 6 years ago

@AtosNicoS Okay, giving a try by enabling more optimizations: https://tools.taskcluster.net/groups/MaHPfnchSjiWuxBmJfJh_w/tasks/HojdsHLXRWWqyr1i-MRm5g/details

lissyx commented 6 years ago

@AtosNicoS This one should have more optimization enabled :) https://queue.taskcluster.net/v1/task/HojdsHLXRWWqyr1i-MRm5g/runs/0/artifacts/public/deepspeech-v0.1.1.r67.gae146d0-1-x86_64.pkg.tar.xz

AtosNicoS commented 6 years ago

I tried yours package, but it still only uses a single CPU core. I also tested the precompiled binary under ubuntu 17.10 which also uses a single core. Have you verified that its working on your PC?

lissyx commented 6 years ago

@AtosNicoS I just re-checked, also with latest master, ubuntu 17.10, and I do see several threads created and running during inference. https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu/artifacts/public/native_client.tar.xz How do you check on your side? I'm taking a look at htop during inference.

lissyx commented 6 years ago

TF_CPP_MIN_VLOG_LEVEL=2 ./deepspeech ../models/output_graph.pbmm ../models/alphabet.txt ../audio/ -t 2>&1 | grep -i thread
2018-04-26 08:56:23.173155: I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 8
2018-04-26 08:56:23.173424: I tensorflow/core/common_runtime/direct_session.cc:82] Direct session inter op parallelism threads: 8

AtosNicoS commented 6 years ago

I get the same output, but from the gnome system monitor I can see that only one core is used. And it takes quite a long time to calculate:

 TF_CPP_MIN_VLOG_LEVEL=2 deepspeech output_graph.pb alphabet.txt lm.binary trie test.wav  -t 2>&1 | grep -i thread
2018-04-26 09:54:48.800322: I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 8
2018-04-26 09:54:48.800603: I tensorflow/core/common_runtime/direct_session.cc:82] Direct session inter op parallelism threads: 8

screenshot from 2018-04-26 09-55-14

lissyx commented 6 years ago

@AtosNicoS Can you give a check with htop as well ? What are your system specs and how "quite a long time" is ?

AtosNicoS commented 6 years ago

Same with htop (Arch Linux): screenshot from 2018-04-26 10-07-26

My system is a Fujitsu P957 Desktop with the latest i7 7700, 32GB ram and SSD. I am either running it bare metal or inside a VM from windows (I am switching back and forth, but it also happens on bare metal). Details about the PC: http://www.fujitsu.com/de/products/computing/pc/desktops/esprimo-p957-e94/

It takes 14 seconds to decode a 3s sound file inside the VM. Similar measures for bare metal. I used this model: https://github.com/ynop/deepspeech-german ~~I will try the 'official' deepspeech model, but from my previous experiences I think it will not differ in any way.~~ Edit: Also only uses 1 core, but is faster. It only takes 8 seconds to decode.

I noticed the same behavior on my (completely independant) private Fujitsu E744 Laptop with a slightly slower i7 CPU.

lissyx commented 6 years ago

@AtosNicoS Okay, clearly, 14 secs for 3 secs of audio on an i7-7700 is not good. Since it's slow, can you htop -p $(pidof deepspeech) ? We should be able to see the threads for sure.

Can you also ldd deepspeech to make sure it's picking up libpthread ?

AtosNicoS commented 6 years ago

screenshot from 2018-04-26 10-24-11

$ ldd /usr/bin/deepspeech 
    linux-vdso.so.1 (0x00007ffff5f70000)
    libdeepspeech.so => /usr/lib/libdeepspeech.so (0x00007f580c7bf000)
    libdeepspeech_utils.so => /usr/lib/libdeepspeech_utils.so (0x00007f580e937000)
    libsox.so.3 => /usr/lib/libsox.so.3 (0x00007f580c535000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f580c1ae000)
    libm.so.6 => /usr/lib/libm.so.6 (0x00007f580be1a000)
    libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f580bc03000)
    libc.so.6 => /usr/lib/libc.so.6 (0x00007f580b848000)
    libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f580b644000)
    libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f580b426000)
    /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f580e744000)
    libltdl.so.7 => /usr/lib/libltdl.so.7 (0x00007f580b21c000)
    libpng16.so.16 => /usr/lib/libpng16.so.16 (0x00007f580afe6000)
    libz.so.1 => /usr/lib/libz.so.1 (0x00007f580adcf000)
    libmagic.so.1 => /usr/lib/libmagic.so.1 (0x00007f580abad000)
    libgsm.so.1 => /usr/lib/libgsm.so.1 (0x00007f580a9a1000)
    libgomp.so.1 => /usr/lib/libgomp.so.1 (0x00007f580a773000)

lissyx commented 6 years ago

So there are threads, and they are running ?! Can you try without the language model ?

AtosNicoS commented 6 years ago

I removed the lm.binary and trie from the command line and the german model now finishes instant and the us model takes about 5 seconds now. It looks like its multithreading now, but the CPU overall is still not at 100%.

lissyx commented 6 years ago

@AtosNicoS Well, as I said, we don't directly control the level of parallelism, it depends on the tensorflow ops themselves. I'm surprised the KenLM language model takes that much of time for you, but at least it means we have something comparable now.

lissyx commented 6 years ago

@AtosNicoS I've pushed a first tentative 0.2.0-alpha.0 tag now that we have simplified that process :)

AtosNicoS commented 6 years ago

Thanks! I got it building using this PKGBUILD:

pkgname=deepspeech
_pkgname=DeepSpeech
pkgver=0.2.0_alpha.3
pkgrel=1
pkgdesc="A TensorFlow implementation of Baidu's DeepSpeech architecture"
arch=('x86_64')
url="https://github.com/mozilla/DeepSpeech"
license=('MPL2')
makedepends=('bazel' 'python-numpy' 'python-pip' 'python-wheel' 'python-setuptools' 'git')
depends=('python-tensorflow' 'python-scipy' 'sox')
source=("${pkgname}-${pkgver}.tar.gz::https://github.com/mozilla/DeepSpeech/archive/v${pkgver//_/-}.tar.gz"
            "git+https://github.com/mozilla/tensorflow.git#branch=r1.6"
            17508.patch)
sha512sums=('9ee15be1b22a1d327c97f8e94b5b0e3b779c574a150ed1bd97b0d7ccbe583f625df9014debae4daa238dbfcacf4bc1929e4722349055038c8020744a8194d6d3'
            'SKIP'
            '18e3b22e956bdd759480d2e94212eb83d6a59381f34bbc7154cadbf7f42686c2f703cc61f81e6ebeaf1da8dc5de8472e5afc6012abb1720cadb68607fba8e8e1')

prepare()
{
  patch -Np1 -i ${srcdir}/17508.patch -d tensorflow
  cd "$srcdir/tensorflow"

  # These environment variables influence the behavior of the configure call below.
    export PYTHON_BIN_PATH=/usr/bin/python
  export USE_DEFAULT_PYTHON_LIB_PATH=1
  export TF_NEED_JEMALLOC=1
  export TF_NEED_KAFKA=0
  export TF_NEED_OPENCL_SYCL=0
  export TF_NEED_GCP=0
  export TF_NEED_HDFS=0
  export TF_NEED_S3=0
  export TF_ENABLE_XLA=1
  export TF_NEED_GDR=0
  export TF_NEED_VERBS=0
  export TF_NEED_OPENCL=0
  export TF_NEED_MPI=0
  export TF_NEED_TENSORRT=0
  export TF_SET_ANDROID_WORKSPACE=0
  ln -sf "../${_pkgname}-${pkgver//_/-}/native_client" ./
}

build() {
  cd "$srcdir/tensorflow"
  export CC_OPT_FLAGS="-march=x86-64"
  export TF_NEED_CUDA=0
  ./configure

  bazel build -c opt --copt=-mtune=generic --copt=-march=x86-64 --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt="-D_GLIBCXX_USE_CXX11_ABI=0" //native_client:libctc_decoder_with_kenlm.so
  bazel build --config=monolithic -c opt --copt=-mtune=generic --copt=-march=x86-64 --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:deepspeech_utils //native_client:generate_trie

  cd "${srcdir}/${_pkgname}-${pkgver//_/-}/native_client"
  make deepspeech
}

package() {
  cd "${srcdir}/${_pkgname}-${pkgver//_/-}/native_client"
  PREFIX=${pkgdir}/usr make install
}

Any reason why you removed -O3?

I analyzed the deepspeech output now with a predefined "grammar" as a quick test and got pretty good results: https://github.com/mozilla/DeepSpeech/issues/1290#issuecomment-386217745

lissyx commented 6 years ago

@AtosNicoS Thanks! No good reason, we don't have the flag actually on TaskCluster, I'm not sure there's so much difference between -Ox, what's likely more important is the optimizations such as SSE, AVX, but we should maybe have a look?

I'm also wondering if we should investigate this threading stuff with the language model: a quick look shows it should leverage multiple CPUs, but I'm now unsure if we do it properly :)

AtosNicoS commented 6 years ago

The multithreading stuff should be investigated more, of course.

I want to share my recent PKGBUILD, now with python bindings. Please have a look at it, if I missed anything.

pkgbase=deepspeech
pkgname=('deepspeech' 'python-deepspeech')
_pkgname=DeepSpeech
pkgver=0.2.0_alpha.3
pkgrel=1
pkgdesc="A TensorFlow implementation of Baidu's DeepSpeech architecture"
arch=('x86_64')
url="https://github.com/mozilla/DeepSpeech"
license=('MPL2')
makedepends=('bazel' 'python-numpy' 'python-pip' 'python-wheel' 'python-setuptools' 'git' 'sox' 'swig')
source=("${pkgname}-${pkgver}.tar.gz::https://github.com/mozilla/DeepSpeech/archive/v${pkgver//_/-}.tar.gz"
            "git+https://github.com/mozilla/tensorflow.git#branch=r1.6"
            17508.patch)
sha512sums=('9ee15be1b22a1d327c97f8e94b5b0e3b779c574a150ed1bd97b0d7ccbe583f625df9014debae4daa238dbfcacf4bc1929e4722349055038c8020744a8194d6d3'
            'SKIP'
            '18e3b22e956bdd759480d2e94212eb83d6a59381f34bbc7154cadbf7f42686c2f703cc61f81e6ebeaf1da8dc5de8472e5afc6012abb1720cadb68607fba8e8e1')

prepare()
{
  patch -Np1 -i ${srcdir}/17508.patch -d tensorflow
  cd "$srcdir/tensorflow"

  # These environment variables influence the behavior of the configure call below.
    export PYTHON_BIN_PATH=/usr/bin/python
  export USE_DEFAULT_PYTHON_LIB_PATH=1
  export TF_NEED_JEMALLOC=1
  export TF_NEED_KAFKA=0
  export TF_NEED_OPENCL_SYCL=0
  export TF_NEED_GCP=0
  export TF_NEED_HDFS=0
  export TF_NEED_S3=0
  export TF_ENABLE_XLA=1
  export TF_NEED_GDR=0
  export TF_NEED_VERBS=0
  export TF_NEED_OPENCL=0
  export TF_NEED_MPI=0
  export TF_NEED_TENSORRT=0
  export TF_SET_ANDROID_WORKSPACE=0
  ln -sf "../${_pkgname}-${pkgver//_/-}/native_client" ./
}

build() {
  cd "$srcdir/tensorflow"
  export CC_OPT_FLAGS="-march=x86-64"
  export TF_NEED_CUDA=0
  ./configure

  bazel build -c opt --copt=-mtune=generic --copt=-march=x86-64 --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt="-D_GLIBCXX_USE_CXX11_ABI=0" //native_client:libctc_decoder_with_kenlm.so
  bazel build --config=monolithic -c opt --copt=-mtune=generic --copt=-march=x86-64 --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:deepspeech_utils //native_client:generate_trie

  cd "${srcdir}/${_pkgname}-${pkgver//_/-}/native_client"
  make deepspeech
  make bindings
}

package_deepspeech() {
  depends=('sox')
  cd "${srcdir}/${_pkgname}-${pkgver//_/-}/native_client"
  PREFIX=${pkgdir}/usr make install
}

package_python-deepspeech() {
  pkgdesc="DeepSpeech Python bindings"
  depends=('deepspeech' 'python' 'python-scipy')
  cd "${srcdir}/${_pkgname}-${pkgver//_/-}/native_client"
  PIP_CONFIG_FILE=/dev/null pip install --isolated --root="$pkgdir" --ignore-installed --no-deps dist/deepspeech*.whl

  # Reuse deepspeech .so files
  rm "$pkgdir/usr/bin/deepspeech"
  rm -rf "$pkgdir/usr/lib/python3.6/site-packages/deepspeech/lib"
  ln -s /usr/lib "$pkgdir/usr/lib/python3.6/site-packages/deepspeech/lib"
}

lissyx commented 6 years ago

Thanks! giving it a try: https://tools.taskcluster.net/groups/acc0nK5mRdqWER8gmF2VCg

lissyx commented 6 years ago

@AtosNicoS It seems to have built properly, but I cannot figure out where the python wheel has been produced?

AtosNicoS commented 6 years ago

@lissyx There is no python wheel package, you will produce 2 Archlinux packages ending both with .tar.xz. For ArchLinux we do not want wheel packages, as pip is not the preferred way to install packages. You rather build the package and install it yourself. pip is just a software which automates the process for distributions that dont provide such python packages. For Archlinux writing packages is extremely simple, so you normally write a quick PKGBUILD whenever possible.

Currently I am building the wheel package, as the makefile does not give me any other option and then I install it again with pip into the package. However this is still not perfect, it would be better to use setuptools to install the package directly.

Here is an example of how its normally handled: https://git.archlinux.org/svntogit/community.git/tree/trunk/PKGBUILD?h=packages/python-gitpython https://github.com/gitpython-developers/GitPython/blob/master/setup.py

lissyx commented 6 years ago

@AtosNicoS We use setuptools, so I don't know what should be changed for your usecase ? Anyway, wheel or pkg.tar.xz, I cannot figure out from your PKGBUILD where this is being produced, to be able to expose it as an artifact :)

lissyx commented 6 years ago

@AtosNicoS Re-verifying the thing about threads and KenLM, it looks like there's nothing wrong:

using the language model should not have a big hit on performances, all the measures we did, I'm able to confirm locally, show the time is orders of magnitude lower than the time for the whole inference
the threading support exists only for filtering and estimations, but not for querying.

AtosNicoS commented 6 years ago

@AtosNicoS We use setuptools

I missed that. I will try to use setuptools then instead. sorry and thanks for the hint!

@AtosNicoS Re-verifying the thing about threads and KenLM, it looks like there's nothing wrong:

I try to publish the package on AUR over the weekend. Maybe other ArchLinux members can test that as well and report more feedback. Thanks for looking into it. :)

AtosNicoS commented 6 years ago

I checked building with setuptools directly. It seems you are setting a lot of helper variables in the makefile, so we have to place the install calls into the makefile, not the PKGBUILD. But I want to avoid creating a wheel package and then installing it. It is just a useless step (for non pip users/distribution users).

Old: Make/build -> wheel package -> install via pip -> Distribution package New: Make/build -> install via setuptools directly -> Distribution package

Some of my finding below:

When creating the temp directory you could use mkdir -p. Otherwise it will fail if you run make bindings twice. Note: I am not 100% sure if this is correct or if I missed anything here. https://github.com/mozilla/DeepSpeech/blob/4e53683c43443d6c5335ad70e5e2801be395cebb/native_client/definitions.mk#L107

Same for this rm command. Use -f to ignore files that do not exist: Note: I am not 100% sure if this is correct or if I missed anything here. https://github.com/mozilla/DeepSpeech/blob/4e53683c43443d6c5335ad70e5e2801be395cebb/native_client/Makefile#L43

The question I am asking myself is why you remove the .o files. Are they even packaged? Shouldnt they get removed earlier, directly inside the make bindings-build target?

I would remove this pip install command: https://github.com/mozilla/DeepSpeech/blob/4e53683c43443d6c5335ad70e5e2801be395cebb/native_client/Makefile#L34

Dependencies should be listed in the readme. You dont need pip to install dependency packages. I consider packages via pip a potential security risk and prefer distribution packages whenever possible (as those can verify gpg signatures if available). The person who build should install packages the way he prefers, dont use pip directly. For example in the ArchLinux PKGBUILD we would specify the dependencies directly and remove pip completely.

I am thinking of adding another target to install the bindings directly instead of generating a wheel package first:

bindings-install: bindings-build MANIFEST.in
    cat MANIFEST.in
    rm -f temp_build/python/*_wrap.o
    AS=$(AS) CC=$(CC) CXX=$(CXX) LD=$(LD) CFLAGS="$(CFLAGS)" LDFLAGS="$(LDFLAGS_NEEDED) $(RPATH_PYTHON)" MODEL_LDFLAGS="$(LDFLAGS_DIRS)" UTILS_LDFLAGS="-L${TFDIR}/bazel-bin/native_client" MODEL_LIBS="$(LIBS)" $(PYTHON_PATH) $(NUMPY_INCLUDE) python ./setup.py install --skip-build --optimize=1 $(SETUP_FLAGS)

You can use the command like this then:

# General installation, system wide (not recommended from my point of view)
sudo make bindings-install

# Install into a local directory as non-root user:
SETUP_FLAGS="--root=mypackagedir" make bindings-install

# More specific inside an ArchLinux PKGBUILD
SETUP_FLAGS="--root=${pkgdir}" make bindings-install

lissyx commented 6 years ago

@AtosNicoS Could you take that to Discourse? It's really deriving from the topic and I always found Github a pain to use for more discussions-oriented stuff, especialy about quoting.

lissyx commented 6 years ago

@AtosNicoS Proper python package: https://queue.taskcluster.net/v1/task/N1oZy5VARjCTFM4FBKB67Q/runs/0/artifacts/public/python-deepspeech-0.2.0_alpha.3-1-x86_64.pkg.tar.xz

NicoHood commented 6 years ago

I tried the 0.1.1 model from the github release. It uses a single CPU core and takes 7 seconds to parse a simple audio file. I also used the audio provided by mozilla from the github release. Can you please also test this on your machine?

$ date && deepspeech model/us/output_graph.pb model/us/alphabet.txt model/us/lm.binary model/us/trie 2830-3980-0043.wav && date
Sat May  5 11:26:47 CEST 2018
TensorFlow: b'v1.6.0-16-gc346f2c8fd'
DeepSpeech: unknown
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-05-05 11:26:47.679101: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
experience proves this
Sat May  5 11:26:54 CEST 2018

$ date && deepspeech_python model/us/output_graph.pb 2830-3980-0043.wav model/us/alphabet.txt model/us/lm.binary model/us/trie && date
Sat May  5 11:30:22 CEST 2018
Loading model from file model/us/output_graph.pb
2018-05-05 11:30:22.877855: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.309s.
Loading language model from files model/us/lm.binary model/us/trie
Loaded language model in 2.115s.
Running inference.
experience proves this
Inference took 5.351s for 1.975s audio file.
Sat May  5 11:30:30 CEST 2018

NicoHood commented 6 years ago

As another error I get the following message with the python deepspeech binary on my personal laptop:

$ date && deepspeech_python model/de/output_graph.pb 2830-3980-0043.wav model/de/alphabet.txt && date
Sat May  5 11:31:39 CEST 2018
Loading model from file model/de/output_graph.pb
2018-05-05 11:31:39.524129: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.063s.
Running inference.
2018-05-05 11:31:39.665658: E tensorflow/core/framework/op_segment.cc:53] Create kernel failed: Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=<unknown>; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
2018-05-05 11:31:39.665717: E tensorflow/core/common_runtime/executor.cc:643] Executor failed to create kernel. Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=<unknown>; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
     [[Node: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice)]]
Error running session: Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op<name=TensorArrayV3; signature=size:int32 -> handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=<unknown>; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
     [[Node: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3[clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,750], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](bidirectional_rnn/bw/bw/TensorArrayUnstack/strided_slice)]]
None
Inference took 0.080s for 1.975s audio file.
Sat May  5 11:31:39 CEST 2018

I am using this german language model. The english provides by mozilla works properly. But for some reason the "normal" native client without python bindings works properly for this model on my machine.

The python tool and the native tool have different order of command line parameters. Those should be made equal.

lissyx commented 6 years ago

@NicoHood All the tests I could do regarding threading shows it works as expected. The error you have is because you trained with TensorFlow >= r1.5 and you used binaries v0.1.1 (TensorFlow r1.4) for inference. Order of the parameter is the same in all binaries, but it changed since 0.1.1

lissyx commented 6 years ago

@NicoHood Please verify threading with TF_CPP_MIN_VLOG_LEVEL=2, as documented earlier.

lissyx commented 6 years ago

alex@portable-alex:~/tmp/deepspeech/cpu-0.1.1$ wget https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.1.1.cpu/artifacts/public/native_client.tar.xz && tar xf native_client.tar.xz
--2018-05-05 14:07:46--  https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.1.1.cpu/artifacts/public/native_client.tar.xz
Résolution de index.taskcluster.net (index.taskcluster.net)… 54.243.65.240, 54.225.111.188, 23.23.146.2
Connexion à index.taskcluster.net (index.taskcluster.net)|54.243.65.240|:443… connecté.
requête HTTP transmise, en attente de la réponse… 303 See Other
Emplacement : https://queue.taskcluster.net/v1/task/bzVx8U5xSgSUisIqdsLFmA/artifacts/public%2Fnative_client.tar.xz [suivant]
--2018-05-05 14:07:47--  https://queue.taskcluster.net/v1/task/bzVx8U5xSgSUisIqdsLFmA/artifacts/public%2Fnative_client.tar.xz
Résolution de queue.taskcluster.net (queue.taskcluster.net)… 54.243.65.240, 54.225.111.188, 23.23.146.2
Connexion à queue.taskcluster.net (queue.taskcluster.net)|54.243.65.240|:443… connecté.
requête HTTP transmise, en attente de la réponse… 303 See Other
Emplacement : https://taskcluster-artifacts.net/bzVx8U5xSgSUisIqdsLFmA/0/public/native_client.tar.xz [suivant]
--2018-05-05 14:07:48--  https://taskcluster-artifacts.net/bzVx8U5xSgSUisIqdsLFmA/0/public/native_client.tar.xz
Résolution de taskcluster-artifacts.net (taskcluster-artifacts.net)… 54.230.76.189
Connexion à taskcluster-artifacts.net (taskcluster-artifacts.net)|54.230.76.189|:443… connecté.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : 10187244 (9,7M) [application/x-xz]
Enregistre : «native_client.tar.xz»

native_client.tar.xz                                                            100%[=======================================================================================================================================================================================================>]   9,71M  1,89MB/s    ds 6,4s    

2018-05-05 14:07:55 (1,51 MB/s) - «native_client.tar.xz» enregistré [10187244/10187244]

alex@portable-alex:~/tmp/deepspeech/cpu-0.1.1$ ./deepspeech ../models/output_graph.pb ../audio/2830-3980-0043.wav ../models/alphabet.txt -t 
2018-05-05 14:08:42.096843: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
experience proves tis
cpu_time_overall=5.35120 cpu_time_mfcc=0.00319 cpu_time_infer=5.34801
alex@portable-alex:~/tmp/deepspeech/cpu-0.1.1$ ./deepspeech ../models/output_graph.pb ../audio/2830-3980-0043.wav ../models/alphabet.txt ../models/lm.binary ../models/trie -t 
2018-05-05 14:08:58.059388: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
experience proves this
cpu_time_overall=5.51043 cpu_time_mfcc=0.00266 cpu_time_infer=5.50778
alex@portable-alex:~/tmp/deepspeech/cpu-0.1.1$ TF_CPP_MIN_VLOG_LEVEL=2 ./deepspeech ../models/output_graph.pb ../audio/2830-3980-0043.wav ../models/alphabet.txt -t 2>&1 | grep -i thread
2018-05-05 14:09:20.403349: I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 8
2018-05-05 14:09:20.403595: I tensorflow/core/common_runtime/direct_session.cc:85] Direct session inter op parallelism threads: 8

lissyx commented 6 years ago

And using time:

alex@portable-alex:~/tmp/deepspeech/cpu-0.1.1$ time ./deepspeech ../models/output_graph.pb ../audio/2830-3980-0043.wav ../models/alphabet.txt -t 
2018-05-05 14:11:26.084127: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
experience proves tis
cpu_time_overall=5.45314 cpu_time_mfcc=0.00314 cpu_time_infer=5.44999

real    0m3,791s
user    0m4,865s
sys 0m0,957s

lissyx commented 6 years ago

capture d ecran de 2018-05-05 14-12-43

mozilla / DeepSpeech

ArchLinux PKGBUILDs for native client and python bindings #979