Open alessiaatunimi opened 5 years ago
The segfault can be fixed by my comment in the other issue https://github.com/tensorflow/lingvo/issues/136#issuecomment-520066943
What about the persistence of the dataset?
You should download the dataset outside of docker, then link it into the docker instance with -v, so the dataset doesn't get removed when you quit docker.
I downloaded the dataset outside docker (running the first two bash file: _librispeech.01.downloadtrain.sh and _librispeech.02.downloaddevtest.sh). However, for the other two (_librispeech.03.parameterizetrain.sh and l_ibrispeech.04.parameterizedevtest.sh) I think that it's necessary to do it inside a docker container, isn't it? I cannot run them every time I rerun an exited container... Really sorry for the dumb issues, I'm really appreciating your availability and kindness
Hm, sorry I've never actually tried the librispeech processing scripts myself :(
I think if you create an empty directory and then link it into docker with -v then put stuff inside the directory from inside docker it should still remain even after you exit.
Jonathan Shen is correct.
I've used the docker container successfully without having to preprocess the librispeech data every time after exiting, since I used the -v option.
On Mon, Aug 12, 2019 at 2:26 PM Jonathan Shen notifications@github.com wrote:
Hm, sorry I've never actually tried the librispeech processing scripts myself :(
I think if you create the directory and then link it into docker with -v then put stuff inside the directory from inside docker it should still remain even after you exit.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/lingvo/issues/143?email_source=notifications&email_token=ABEL6UBYEHHTJQC7XURCZSLQEHIPPA5CNFSM4ILCD4PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4D4DGA#issuecomment-520602008, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEL6UCNHESYDVINZB36GYTQEHIPPANCNFSM4ILCD4PA .
-- Daniel Galvez http://danielgalvez.me https://github.com/galv
Jonathan Shen is correct. I've used the docker container successfully without having to preprocess the librispeech data every time after exiting, since I used the -v option. …
Did you:
I'm a lot confused, I'd really appreciate your help
You can download from either inside or outside of docker, but you need to make sure that the mounted directory is outside of the container. So for instance you'd start with -v /tmp/librispeech ( https://github.com/tensorflow/lingvo/blob/master/lingvo/tasks/asr/tools/librispeech_lib.sh#L17 ) After you exit the container, the data will containe to be there.
On Wed, Aug 14, 2019 at 12:28 AM alessiaatunimi notifications@github.com wrote:
Jonathan Shen is correct. I've used the docker container successfully without having to preprocess the librispeech data every time after exiting, since I used the -v option. … <#m-1336625568552878352>
Did you:
- download the dataset with librispeech.01.download_train.sh http://librispeech.01.download_train.sh and librispeech.02.download_devtest.sh http://librispeech.02.download_devtest.sh outside docker
- build a docker container with -v
- run librispeech.03.parameterize_train.sh http://librispeech.03.parameterize_train.sh and librispeech.04.parameterize_devtest.sh http://librispeech.04.parameterize_devtest.sh inside that docker container?
I'm a lot confused, I'd really appreciate your help
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/lingvo/issues/143?email_source=notifications&email_token=AE75E3JKOY5QI5DCZZED4BDQELHILA5CNFSM4ILCD4PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4GA6QA#issuecomment-520884032, or mute the thread https://github.com/notifications/unsubscribe-auth/AE75E3KDZV25UKYRIULYJLTQELHILANCNFSM4ILCD4PA .
If I'm not wrong you're saying me that I have to edit the line in this file https://github.com/tensorflow/lingvo/blob/master/lingvo/tasks/asr/tools/librispeech_lib.sh#L17 where the ROOT is specified from ROOT=/tmp/librispeech to ROOT=-v /tmp/librispeech? Once modified, run again the first 2 files to download the dataset, and inside docker container the other two. Seen that I have the -v option, what the preprocess do will be permanent?
The -v is in the docker command, as described in https://www.digitalocean.com/community/tutorials/how-to-share-data-between-the-docker-container-and-the-host
Hi, I'm having doubts about the download and parametrization for the dataset. I successfully run the four bash files in the folder lingvo/tasks/asr/tools. Until the container remains up (and so I can access to it with _docker exec -it containername having the same container id ) , I can easily find my folder librispeech with all the data I need. But then, when the docker container goes down, and I run the container again with docker run I have all the data but the dataset folder. However when I run the model, the error I get is a segmentation fault, it doesn't say anything about the dataset missing. Can you help me? I tried to commit che container image but it didn't work.