kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.12k stars 5.31k forks source link

Depriciated URL in the script kaldi/tools/extras/ install_srilm.sh #4771

Open nayanjha16 opened 2 years ago

nayanjha16 commented 2 years ago

The URL present in the code file kaldi/tools/extras/install_srilm.sh to download the srilm seems to be depreciated. The current url provided is : http://www.speech.sri.com/projects/srilm/srilm_download.php kindly update it with the new one.

jtrmal commented 2 years ago

hi, I have tried now three times access the new URL and I still cannot visit it. Seems like the whole server is not responding for me. Could someone else try, please? y.

On Tue, Aug 9, 2022 at 2:43 PM nayanjha16 @.***> wrote:

The URL present in the code file kaldi/tools/extras/ install_srilm.sh to download the srilm seems to be depreciated. The current url provided is : http://www.speech.sri.com/projects/srilm/srilm_download.php kindly update it with the new one.

— Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4771, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUKYX6NNE4XI4DDUJXLFK3VYJG7RANCNFSM56AU2D7A . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jtrmal commented 2 years ago

I contacted some people at SRI, hopefully the will resolve this. Not our bug, but I keep this open to keep a track of things.

kkm000 commented 2 years ago

The URL responds now. But this whole automation idea is borked from multiple angles. First, it's in a kinda legal gray area, we should point to the license at the very least. Second, the script does not URL-encode data. wget doesn't care, and we apparently do not require curl, which is a bummer. I'll see what we can do with Python 3. It has urllib, which can properly do it.

kkm000 commented 2 years ago

Nice. The honest download of version 1.7.3 through the web site with all form stuff filled out returns 200 OK with a zero-length file. Tarballs for 1.7.2, 1.7.1, 1.6.0 are downloading fine, only 1.7.3 is missing. @jtrmal, if you know who to talk to, could you please let them know?

As I'm reading SRILM license, it allows redistribution "with a prominent license notice", nothing unusual. @danpovey, @jtrmal, could we put the source to openslr? Or better yet, to GitHub, because it's source code only? We can pre-apply the configuration, avoiding the sed gymnastics in install_srilm.sh. License allows this, under a usual source availability requirement. GitHub looks even better w.r.t visibility of the applied changes. We have OpenFST and sph2pipe on GitHub already. Tangentially, maybe moving this stuff to kaldi-asr org is a good idea? or, even better, create a separate org for Kaldi dependencies? For SRILM, the only special requirement in its license is to register the URL of the redistribution point:

3.3. Licensee Registration. Before You Distribute [SRILM] under this License, You must first register by sending email [...] to srilm@speech.sri.com, including a statement confirming that you accept [...] this License and [...] identify the URL [used] to make Source Code available.

kkm000 commented 2 years ago

Meanwhile, I googled up this: https://github.com/weimeng23/SRILM :)

danpovey commented 2 years ago

If the license allows that, then yes, i think we could just put a fork on github.

kkm000 commented 2 years ago

@danpovey, I'm thinking of setting up an org, e.g. kaldi-dependencies, with pre-patched and/or fixed dependencies like this. For SRILM in particular, we apply a patch for 1.7.1 and earlier (probably irrelevant anymore) and always do awk/sed gymnastics to makefiles. We have openfst (patched for Windows) and sph2pipe (with a makefile) under my own account. We're going to have the third repo. We have ancient dependencies, like sph2pipe, which need to be maintained. I see the number growing over time, they are rarely needed but can't be dropped. Besides, check_dependencies has pointers to mirrors on openslr, and we once had an issue with the main unresponsive and the backup outdated. GitHub is consistently available.

Or do you think kaldi-asr is a better place? My take,, I don't want to clutter it.

danpovey commented 2 years ago

I think that's a great idea!
if it's on github it will be more future-proof I think.

jtrmal commented 2 years ago

the link still doesn't work for me. :( I would probably leave the depends it in kaldi-asr/

On Thu, Sep 1, 2022 at 11:29 AM Daniel Povey @.***> wrote:

I think that's a great idea! if it's on github it will be more future-proof I think.

— Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4771#issuecomment-1234014542, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUKYX7H3I7UDT5H66TTAW3V4BZQLANCNFSM56AU2D7A . You are receiving this because you were mentioned.Message ID: @.***>

kkm000 commented 2 years ago

On second thought, @jtrmal is right. any case. We have nothing in kaldi-asr but Kaldi anyway. There are orgs with hundreds of repos, and we're talking no more than 5 now, probably under 10 in the future, and that's it. By far not a clutter. Github sorts them the most active first, so Kaldi will show up on top, and can also be pinned. @danpovey, what's your take?

kkm000 commented 2 years ago

@danpovey?

danpovey commented 2 years ago

Yes I agree, kaldi-asr is fine