Open jdxyw opened 6 years ago
I am encountering this issue as well. Running with the edit_logp
config, the process is consistently killed at the same point with the following output:
[localhost] local: wc -l /data/yelp_dataset_large_split/train.tsv
Reading data file.: 20%|#############4
Reading data file.: 26%|#################3
Killed
The same issue is occurring with other configs as well.
I have the same issue. Training is consistently killed.
[localhost] local: wc -l /data/onebillion_split/train.tsv Reading data file.: 17%|##############1 | 582582/3506331 [02:43<19:10:00, 42.37it/s]Reading data file.: 17%|##############5 | 594704/3506331 [02:44<39:10, 1238.52it/s] Killed
Looks like this is a memory issue. I ran on my cluster and it ran fine.
@yamsgithub hello,do you config this project by running "run_docker.py"? Because some network reasons, I can not run it successfully. I install all packages one by one and encounter an issue about git like this:
Traceback (most recent call last):
File "textmorph/edit_model/main.py", line 34, in
It seems like path problem.However this issue is still existing after I create this master folder in refs/heads/
@Vonzpf Yes. I am following the instructions as per the README and didn't have any issues. However without gpu the training has been running for 3 days now and is about 36% complete. So I would recommend using gpus. Hopefully it is faster. This is on the one billion text.
@yamsgithub did you load any other modules besides pytorch python when you ran the code on the cluster?
@luciay I just used the docker which setup all the dependencies. I didn't have to install anything else except docker on my machine.
@luciay if you are running on a cluster I would recommend creating a virtual environment and let the docker install all packages in that env.
@yamsgithub Thank you! I had solved that problem luckily. This project need git to record the code's state. I initialize the repo at my folder "/neural-editor/", but I forgot to add and commit the code. So I just need using "git add ." and "git commit" at folder "/neural-editor/" to solve the problem.
@yamsgithub I spoke with @luciay and she shared her batch script which runs on the prince cluster with Singularity instead of Docker on CPU. I then made some modifications so it runs with GPU on the Prince cluster. You can see my fork here -> https://github.com/JackLangerman/neural-editor
Hope this helps people!
Hi,
My training is always killed without any error information like below.