rizar / attention-lvcsr

End-to-End Attention-Based Large Vocabulary Speech Recognition
MIT License
261 stars 100 forks source link

some errors when install kaldi-python #2

Open Entonytang opened 8 years ago

Entonytang commented 8 years ago

ubuntu 14.04. use thi command(.setup.py install) to setup kaldi-pthon, I have set $KALDI_ROOT already the errors are as follows:

/usr/include/python2.7/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]

warning "Using deprecated NumPy API, disable it by " \

^ /usr/bin/ld: /home/jtang/Kaldi/kaldi-trunk/src/matrix/kaldi-matrix.a(kaldi-matrix.o): relocation R_X86_64_32 against .rodata' can not be used when making a shared object; recompile with -fPIC /home/jtang/Kaldi/kaldi-trunk/src/matrix/kaldi-matrix.a: error adding symbols: Bad value collect2: error: ld returned 1 exit status make[1]: *** [kaldi_io_internal.so] Error 1 make[1]: Leaving directory/home/jtang/Attention_ASR/kaldi-python/kaldi_io' make: *\ [all] Error 2

thse errors seem to happen in creating kaldi_io_internal.so, if I don't use these .a file $(KALDI_SRC)/matrix/kaldi-matrix.a $(KALDI_SRC)/util/kaldi-util.a $(KALDI_SRC)/base/kaldi-base.a , kaldi_io_internal.so can create(of course this file can't be used)

rizar commented 8 years ago

As far as I remember Kaldi has to be compiled differently for Kaldi-python installation to be successful. @dmitriy-serdyuk , @janchorowski , can you please comment on that?

Entonytang commented 8 years ago

Can you tell me the methods you compile kaldi.......(aslo means : how can you get the file like kaldi_matrix.a.........)

enjtang@mail.ustc.edu.cn

From: Dzmitry Bahdanau Date: 2015-11-20 23:34 To: rizar/attention-lvcsr CC: Entonytang Subject: Re: [attention-lvcsr] some errors when install kaldi-python (#2) As far as I remember Kaldi has to be compiled differently for Kaldi-python installation to be successful. @dmitriy-serdyuk , @janchorowski , can you please comment on that? — Reply to this email directly or view it on GitHub.

dmitriy-serdyuk commented 8 years ago

Right, sorry, that I didn't mention this. Kaldi should be compile with shared flag:

./configure --shared --use-cuda=no # No need for cuda, we don't train models with kaldi
make
rizar commented 8 years ago

Could you please change the documentation? I guess it makes sense to do it our private repository, since we are going to make what we have there the new master pretty soon.

Entonytang commented 8 years ago

After change the configure command, problem solved...... this steps: $LVSR/bin/run.py train wsj_paper6 $LVSR/exp/wsj/configs/wsj_paper6.yaml this default configuration trains model using CPU.......how to use GPU instead.......

rizar commented 8 years ago

You can use GPU in the same way as you usually do it with Theano. Please read Theano documentation.

On 23 November 2015 at 05:51, Entonytang notifications@github.com wrote:

After change the configure command, problem solved...... this steps: $LVSR/bin/run.py train wsj_paper6 $LVSR/exp/wsj/configs/wsj_paper6.yaml this default configuration trains model using CPU.......how to use GPU instead.......

— Reply to this email directly or view it on GitHub https://github.com/rizar/attention-lvcsr/issues/2#issuecomment-158903325 .

Entonytang commented 8 years ago

After adding "device =gpu3" while I find GPU Process in GPU 2(device K40).....using default wsj_paper6.yaml..... it costs 65 seconds per steps(1 epoch = 3700 steps), I think this speed is too slow for GPU...... so this speed is right or not , what should I do for speed up the training process and How much time one epoch?

dmitriy-serdyuk commented 8 years ago

As I measured recently, one step was taking about 6 seconds on a Titan X, K40 was a bit slower, about 8-9 seconds. So probably something goes wrong.

Make sure that Theano writes something like Using gpu device 1: GeForce GTX TITAN X (CNMeM is enabled). Another suggestion is to check that you use float32, not float64. I also use optimizer_excluding=cudnn option since I had some issues with CUDNN.

rizar commented 8 years ago

Also use optimizer=fast_run in your THEANO_FLAGS

On 24 November 2015 at 10:34, dmitriy-serdyuk notifications@github.com wrote:

As I measured recently, one step was taking about 6 seconds on a Titan X, K40 was a bit slower, about 8-9 seconds. So probably something goes wrong.

Make sure that Theano writes something like Using gpu device 1: GeForce GTX TITAN X (CNMeM is enabled). Another suggestion is to check that you use float32, not float64. I also use optimizer_excluding=cudnn option since I had some issues with CUDNN.

— Reply to this email directly or view it on GitHub https://github.com/rizar/attention-lvcsr/issues/2#issuecomment-159305992 .

Entonytang commented 8 years ago

thanks, solved..... while at the 830 steps. the program stoped without any warnings......while GPU Process is still there.......the bokeh-server is also there. and wsj_paper6.yaml doesn't seems to be the setting in end-to-end attention-based lvcsr...(250 Bi-GRUs in paper while wsj_paper6 has 320)

Epoch 0, step 829 | # | Elapsed Time: 2:09:35


Training status: best_valid_per: 1 best_valid_sequence_log_likelihood: 503.460199693 epochs_done: 0 iterations_done: 829 Log records from the iteration 829: gradient_norm_threshold: 239.912979126 max_attended_length: 400.0 max_attended_mask_length: 400.0 max_recording_length: 1600.0 sequence_log_likelihood: 189.054199219 time_read_data_this_batch: 0.0219719409943 time_read_data_total: 19.5282828808 time_train_this_batch: 11.5933840275 time_train_total: 7709.37198544 total_gradient_norm: 135.73147583 total_step_norm: 1.07967531681

Epoch 0, step 830 | # | Elapsed Time: 2:09:46

dmitriy-serdyuk commented 8 years ago

Is there exception or a core dump? Otherwise it's something wrong with your OS.

Entonytang commented 8 years ago

I don't think so, I use another core and try again. the result is similar...... (The best_valid_sequence_log_likelihood: 503.460199693 which is same as the result after 830 steps.) while only pretraining_model.zip| pretraining_log.zip| pretraining.zip appear in wsj_paper6 file. and is the wsj_paper6.yaml is the right config?

Epoch 0, step 84 | #| Elapsed Time: 0:09:18


Training status: best_valid_per: 1 best_valid_sequence_log_likelihood: 503.460199693 epochs_done: 0 iterations_done: 84 Log records from the iteration 84: gradient_norm_threshold: 85.4330291748 max_attended_length: 248.0 max_attended_mask_length: 248.0 max_recording_length: 990.0 sequence_log_likelihood: 264.288513184 time_read_data_this_batch: 0.0211541652679 time_read_data_total: 2.17928504944 time_train_this_batch: 5.36292505264 time_train_total: 556.870803595 total_gradient_norm: 109.950737 total_step_norm: 0.572255551815 while if I use wsj_paper4.jaml, the training process seems to be no problem.......