rizar / actor-critic-public

The source code for "An Actor Critic Algorithm for Structured Prediction"
MIT License
167 stars 43 forks source link

Running the tests gives a segmentation fault. #1

Closed shivamvats closed 7 years ago

shivamvats commented 7 years ago

@rizar I followed the instructions in README to run the tests. However, I am getting seg faults for all the tests I tried. My guess is that code is not getting the correct data. Are there additional steps I need to take to properly set the run time paths?

rizar commented 7 years ago

By "tests", do you mean the content of "tests" folder, or do you mean the actual experiments? The former was included in the repository by mistake and should be deleted.

shivamvats commented 7 years ago

I am sorry for the confusion. I mean the actual experiments billion words and ted used in the paper.

rizar commented 7 years ago

OK, I see. Can you please provide a bit more information about the exact commands that you ran?

On Mon, 19 Dec 2016 at 21:01 Shivam Vats notifications@github.com wrote:

I am sorry for the confusion. I mean the actual experiments billion words and ted used in the paper.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rizar/actor-critic-public/issues/1#issuecomment-268133363, or mute the thread https://github.com/notifications/unsubscribe-auth/AAn8Yi2riOFcxLm6T9Q6e6njSzz-MKeFks5rJzcSgaJpZM4LQih3 .

shivamvats commented 7 years ago
  1. source env.sh.
  2. cd exp/billion_words
  3. $LVSR/bin/run.py train autoencoder3 configs/autoencoder3.yaml
  4. cd ../ted
  5. ./create_dataset.sh (I first set the path to MOSES)`
  6. $LVSR/bin/run.py train ted12 config/ted12.yaml

Error:

Using gpu device 0: GeForce GTX 960M (CNMeM is enabled with initial size: 90.0% of memory, cuDNN not available)    
Segmentation fault
rizar commented 7 years ago

Hold on, did the step 3 work for you? You should've downloaded the data first. I will add the instruction on how to download the data today.

rizar commented 7 years ago

https://github.com/rizar/actor-critic-public/blob/master/exp/billion_words/README.md is now updated

shivamvats commented 7 years ago

I did download the data (I put it in lvsr/datasets though).

I still get the same error after putting the data in FUEL_DATA_PATH. I am surprised to see a seg fault with Python. Are you using C wrappers in your code? From fuel's docs, it seems that it needs data as a .hdf5 file. Does it need to be converted to fuel's format?

Also, in the configs for ted the data filename is set as TED/de-en/ted.h5 (I would expect it to be ted.h5).

rizar commented 7 years ago

No, their is no C code in our implementation. But libhdf5 could trigger one. But let's focus on one experiment. Do you have a segfault when you are trying to run on TED or on billion_words?

On Wed, 21 Dec 2016 at 13:27 Shivam Vats notifications@github.com wrote:

I did download the data (I put it in lvsr/datasets though).

I still get the same error after putting the data in FUEL_DATA_PATH. I am surprised to see a seg fault with Python. Are you using C wrappers in your code? From fuel's docs, it seems that it needs data as a .hdf5 file. Does it need to be converted to fuel's format?

Also, in the configs for ted the data filename is set as TED/de-en/ted.h5 (I would expect it to be ted.h5).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rizar/actor-critic-public/issues/1#issuecomment-268601668, or mute the thread https://github.com/notifications/unsubscribe-auth/AAn8Yof13Ky_9ziHFe0I6XT0m9OYGTMWks5rKW-ggaJpZM4LQih3 .

shivamvats commented 7 years ago

Both.

rizar commented 7 years ago

OK, let's focus on billion words then. In order to help you, I would need to know at which line of lvsr/main.py the segmentation fault happens.

On Wed, 21 Dec 2016 at 13:34 Shivam Vats notifications@github.com wrote:

Both.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rizar/actor-critic-public/issues/1#issuecomment-268603602, or mute the thread https://github.com/notifications/unsubscribe-auth/AAn8YoyIGF1WO2OazlRzwl3ZfnqvJHJUks5rKXFDgaJpZM4LQih3 .

shivamvats commented 7 years ago

I figured out the issue. Importing fst is leading to a seg fault. I tried reinstalling pyfst and openfst but it doesn't work. The code should work without fst, right?

rizar commented 7 years ago

Yes, this an obsolete import. Feel free to remove it. I will remove it later as well. Thanks for the heads-up!

On Wed, 21 Dec 2016 at 14:47 Shivam Vats notifications@github.com wrote:

I figured out the issue. Importing fst is leading to a seg fault. I tried reinstalling pyfst and openfst but it doesn't work. The code should work without fst, right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rizar/actor-critic-public/issues/1#issuecomment-268621297, or mute the thread https://github.com/notifications/unsubscribe-auth/AAn8YtAcvcARoC5oFt02ky7S03NNC9aTks5rKYJogaJpZM4LQih3 .

shivamvats commented 7 years ago

Thanks a lot for your prompt help! :) Would it be possible for you to give me a rough estimate of the training time? I am using a GTX 960, with 4GB memory.

rizar commented 7 years ago

I am afraid this amount of memory might not be sufficient. On a K80 the pretraining (first actor, then critic) takes about 36 hours, and then you can run the main stage for as long as you want...

On Wed, 21 Dec 2016 at 14:56 Shivam Vats notifications@github.com wrote:

Thanks a lot for your prompt help! :) Would it be possible for you to give me a rough estimate of the training time? I am using a GTX 960, with 4GB memory.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rizar/actor-critic-public/issues/1#issuecomment-268623261, or mute the thread https://github.com/notifications/unsubscribe-auth/AAn8YqIz_JcTVDlrZPr9RpHxkfghjYhxks5rKYRQgaJpZM4LQih3 .