kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.32k stars 5.33k forks source link

validate_data_dir.sh has a different output under different environments #4488

Open jyhan03 opened 3 years ago

jyhan03 commented 3 years ago

Hello, when I run the script of /kaldi-root/egs/chime6/s5_track1/run.sh, the program is cracked due to the /kaldi-root/egs/wsj/s5/utils/validate_data_dir.sh.

The error happened in the 129-th line of the validate_data_dir.sh (n_non_print=$(LC_ALL="C.UTF-8" grep -c '[^[:print:][:space:]]' $data/text), and error report is "utils/validate_data_dir.sh: text contains 50 lines with non-printable characters ".

Furthermore, I use a same text file and run the command of "echo $(LC_ALL="C.UTF-8" grep -c '[^[:print:][:space:]]' $PWD/text)" in two servers. The one is ubuntu18.04 and the other is centos 7.9. And the output of the two servers is 0 and 50.

Also, I install the same gcc into the two servers, but the outputs aren't changed.

In addition, I changed the command of "LC_ALL=C" to "LCALL=" of ./path.sh_ , the program seems to be able to move on.

Are there any solutions to this problem?

danpovey commented 3 years ago

Run the command yourself and figure out what is on those lines.

On Sat, Apr 3, 2021 at 10:24 AM Jyhan @.***> wrote:

Reopened #4488 https://github.com/kaldi-asr/kaldi/issues/4488.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4488#event-4547494883, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO26HJDF532JMENISSLTGZ36ZANCNFSM42JVRHJQ .

stale[bot] commented 3 years ago

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.