srvk / eesen-transcriber

EESEN based offline transcriber VM using models trained on TEDLIUM and Cantab Research
Apache License 2.0
49 stars 14 forks source link

Running the transcriber without a VM #11

Closed jojo05 closed 8 years ago

jojo05 commented 8 years ago

Hi,

I have Fedora 23 (so I cannot run the Ubuntu VM). Could you please list the basic install steps to get the the full transcriber setup (The package dependencies I will figure out once I start installing).

Thanks

fmetze commented 8 years ago

Hi,

you should be able to pretty much follow (and if needed, adapt) the steps in https://github.com/srvk/eesen-transcriber/blob/master/Vagrantfile https://github.com/srvk/eesen-transcriber/blob/master/Vagrantfile after the "config.vm.provision "shell", inline: <<-SHELL” line - this is essentially a shell script recipe that you can follow.

Hope this helps! Best,

F.

On Jun 11, 2016, at 12:12 PM, jojo05 notifications@github.com wrote:

Hi,

I have Fedora 23 (so I cannot run the Ubuntu VM). Could you please list the basic install steps to get the the full transcriber setup (The package dependencies I will figure out once I start installing).

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen-transcriber/issues/11, or mute the thread https://github.com/notifications/unsubscribe/AEnA8dGD9oz1ptCTYhWSzF5aGpUMNS69ks5qKt5UgaJpZM4IzlUH.

Florian Metze http://www.cs.cmu.edu/directory/florian-metze Associate Research Professor Carnegie Mellon University

riebling commented 8 years ago

You don't need Ubuntu in order to run the VM. You'll need Vagrant and VirtualBox, both of which are supported on Fedora (See Here https://fedoramagazine.org/running-vagrant-fedora-22/ and Here https://www.virtualbox.org/wiki/Linux_Downloads)

What it sounds like you're asking is how to install the system directly on Fedora, without using virtualization or provisioning tools... this is something we cannot provide support for, because it would entail trying/supporting every OS available. The whole reason the transcriber is provided as a VM is to avoid such OS dependencies.

This is not to say that it's not possible to do so, only that we cannot provide detailed support - if this is what you want to do, you're on your own. I can say that the commands bracked by SHELL in the Vagrantfile are sufficient to install everything you would need to install 'by hand' - save for package naming differences between Fedora and Ubuntu (again, something we don't explicitly support, you would have to figure this out, but there are likely equivalents for everything)

Hope this helps, and sorry to not be of more help

On 06/11/2016 12:12 PM, jojo05 wrote:

Hi,

I have Fedora 23 (so I cannot run the Ubuntu VM). Could you please list the basic install steps to get the the full transcriber setup (The package dependencies I will figure out once I start installing).

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen-transcriber/issues/11, or mute the thread https://github.com/notifications/unsubscribe/ACX11viq4JCAs58Suo0wY9F5wcpoIwYAks5qKt5UgaJpZM4IzlUH.

Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.

jojo05 commented 8 years ago

Hi, your Vagrant comment makes sense. I tried to run the VM on Fedora (see below for output). The text below is a copy of a message I sent to noreply@speechkitchen.org before posting here (obviously the message got lost, I didn't realize the noreply when answering to a forum notification).

I imagine the solution is around tweaking the vagrant file but I am no vagrant or VM guru. Any ideas?


I am running fedora 23. It seems that the config will only work on an ubuntu system. Is that correct ?

Otherwise, do you have a a guide of what I have to configure without the virtualbox?

thanks jose [root@i eesen-transcriber-master]# vagrant up Bringing machine 'default' up with 'libvirt' provider... ==> default: Box 'ubuntu/trusty64' could not be found. Attempting to find and install... default: Box Provider: libvirt default: Box Version: >= 0 ==> default: Loading metadata for box 'ubuntu/trusty64' default: URL: https://atlas.hashicorp.com/ubuntu/trusty64 The box you're attempting to add doesn't support the provider you requested. Please find an alternate box or use an alternate provider. Double-check your requested provider to verify you didn't simply misspell it.

If you're adding a box from HashiCorp's Atlas, make sure the box is released.

Name: ubuntu/trusty64 Address: https://atlas.hashicorp.com/ubuntu/trusty64 Requested provider: [:libvirt]

jojo05 commented 8 years ago

ok, I figured out a bit more. Seems to be a kernel module issue!

[root@i eesen-transcriber-master]# vagrant init ubuntu/trusty64; vagrant up --provider virtualbox A Vagrantfile has been placed in this directory. You are now ready to vagrant up your first virtual environment! Please read the comments in the Vagrantfile as well as documentation on vagrantup.com for more information on using Vagrant.

The provider 'virtualbox' that was requested to back the machine 'default' is reporting that it isn't usable on this system. The reason is shown below:

VirtualBox is complaining that the kernel module is not loaded. Please run VBoxManage --version or open the VirtualBox GUI to see the error message which should contain instructions on how to fix this error.

[root@i eesen-transcriber-master]# VBoxManage --version WARNING: The vboxdrv kernel module is not loaded. Either there is no module available for the current kernel (4.4.9-300.fc23.x86_64) or it failed to load. Please recompile the kernel module and install it by

       sudo /sbin/rcvboxdrv setup

     You will not be able to start VMs until this problem is fixed.

5.0.16_RPMFusionr105871

jojo05 commented 8 years ago

ok, figured everything (not being familiar with VMs before)

After lots of output, it seems stuck in the Cloning into ..., is this normal?

==> default: Cloning into 'lm_build'... ==> default: Cloning into 'srvk-eesen-offline-transcriber'...

riebling commented 8 years ago

Good progress! Yes sorry to not mention prerequisites are that bios, kernel source, and kernel modules need to have virtualization enabled.

The cloning-into steps should not take extremely too long, they are downloading files from GitHub repositories - unless you are on a wireless/slow network connection perhaps?

On 06/13/2016 01:40 PM, jojo05 wrote:

ok, figured everything (not being familiar with VMs before)

  • once kernel source matches running kernel, virtualbox takes care of recompiling the modules
  • had to enable virtualization on BIOS/UEFI (VT-x is off by default on Intel CPUs)

After lots of output, it seems stuck in the Cloning into ..., is this normal?

==> default: Cloning into 'lm_build'... ==> default: Cloning into 'srvk-eesen-offline-transcriber'...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen-transcriber/issues/11#issuecomment-225654389, or mute the thread https://github.com/notifications/unsubscribe/ACX11mN_ChSxo9-R9E_zMWdX24JBsohYks5qLZYrgaJpZM4IzlUH.

Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.

jojo05 commented 8 years ago

Success!

I realize vagrant is great. It took 30min+ without feedback (no high CPU consumption either, but I didn't check network traffic). I am on a 300Mbit line.

Now I need to allocate time to experiment with the transcriber. Many thanks! What should be the channel to ask questions or feedback? I would prefer to use github issues as they work well

jojo05 commented 8 years ago

What steps do you suggest so that I can hookup browser microphone and get real time transcription?

I can send the PCM samples from the browser to server every x miliseconds and then get the text back.

riebling commented 8 years ago

Agreed, and there's always a chance your github issue will help someone else facing similar problems. Great news to hear.

On 06/13/2016 03:24 PM, jojo05 wrote:

Success!

I realize vagrant is great. It took 30min+ without feedback (no high CPU consumption either, but I didn't check network traffic). I am on a 300Mbit line.

Now I need to allocate time to experiment with the transcriber. Many thanks! What should be the channel to ask questions or feedback? I would prefer to use github issues as they work well

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen-transcriber/issues/11#issuecomment-225681811, or mute the thread https://github.com/notifications/unsubscribe/ACX11iJvq8IYUW3LmmvcULN1Ov_MVFS6ks5qLa5jgaJpZM4IzlUH.

Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.

riebling commented 8 years ago

This system isn't really set up for doing real-time decoding. Even the Kaldi toolkit upon which most of the code is based does not favor real-time decoding, but does gives a small example : https://github.com/kaldi-asr/kaldi/tree/master/egs/voxforge/online_demo

This is not to say that it's impossible, we just haven't been putting work into doing real-time decoding, but rather focusing on pre-recorded data for which there is gold standard results to compare against in order to produce scores, measure, and improve accuracy.

There is the gst-kaldi-nnet2-online repository on Tanel Alumae's GitHub https://github.com/alumae/gst-kaldi-nnet2-online which uses the GStreamer plugin to stream audio to a Kaldi decoder. It can work from a command line, and even a GUI demo, but I could not say how one might integrate this into a browser.

On 06/13/2016 03:36 PM, jojo05 wrote:

What steps do you suggest so that I can hookup browser microphone and get real time transcription?

I can send the PCM samples from the browser to server every x miliseconds and then get the text back.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen-transcriber/issues/11#issuecomment-225684776, or mute the thread https://github.com/notifications/unsubscribe/ACX11ptFMY7L01JdiRzQkzr-7tJaP7rmks5qLbEsgaJpZM4IzlUH.

Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.

jojo05 commented 8 years ago

Thanks, you can close the issue. Very pleased with the transcriber VM.

I suggest a wiki page with links to the key papers to understand the transcriber

riebling commented 8 years ago

Thanks