tech-srl / code2seq

Code for the model presented in the paper: "code2seq: Generating Sequences from Structured Representations of Code"
http://code2seq.org
MIT License
548 stars 165 forks source link

Aborted (core dumped) when loading pretrained model #101

Closed Sohaib90 closed 2 years ago

Sohaib90 commented 3 years ago

First of all thank you so much for code2seq.

I have a rather odd problem, which might be because of environment issues but I still need a second opinion. When I train code2seq model, whilst also loading a pretrained model, it works fine. But, when I do not provide a training dataset and just load the model (for example to evaluate or releasing the model in a smaller size) the program aborts after giving the error

2021-08-10 11:44:55.713136: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1 Aborted (core dumped)

At first I thought maybe its an environment issue, which i reinstalled but it did not help

tensorflow-gpu 1.14.0 h0d30ee6_0 defaults cudatoolkit 10.1.243 h6bb024c_0 defaults cudnn 7.6.5 cuda10.1_0 defaults

Has anyone else faced the same problem

Example command on which the problem arises python3 -u code2seq.py --load path_to_model/model_iter29 --release

urialon commented 3 years ago

Hi @Sohaib90 , Thank you for your interest in code2seq!

Can you please provide:

  1. Your operating system
  2. Your python version
  3. Does it happen also if you use the flag --predict instead of the --release flag?
  4. Does it happen if you use TensorFlow 1.12 instead of 1.14?
  5. Does it happen if you disable GPU, for example by running with CUDA_VISIBLE_DEVICES='' python3 -u code2seq.py ... ?

Best, Uri

Sohaib90 commented 3 years ago
  1. Ubuntu 18
  2. python version 3.6
  3. When using GPU I get a traceback now which is as follows Create atf.sparse.SparseTensorand usetf.sparse.to_dense` instead. 2021-08-17 11:43:51.175193: W tensorflow/core/framework/allocator.cc:122] Allocation of 806102784 exceeds 10% of system memory. 2021-08-17 11:44:19.267276: W tensorflow/core/framework/allocator.cc:122] Allocation of 806102784 exceeds 10% of system memory. 2021-08-17 11:44:52.016404: W tensorflow/core/framework/allocator.cc:122] Allocation of 806102784 exceeds 10% of system memory. 2021-08-17 11:44:56.527500: W tensorflow/core/framework/allocator.cc:122] Allocation of 806102784 exceeds 10% of system memory. 2021-08-17 11:45:01.210316: W tensorflow/core/framework/allocator.cc:122] Allocation of 806102784 exceeds 10% of system memory. 2021-08-17 11:51:22.784047: W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 12.01GiB. Current allocation summary follows. 2021-08-17 11:51:22.784129: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (256): Total Chunks: 19, Chunks in use: 19. 4.8KiB allocated for chunks. 4.8KiB in use in bin. 76B client-requested in use in bin. 2021-08-17 11:51:22.784147: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-08-17 11:51:22.784160: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1024): Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin. 2021-08-17 11:51:22.784175: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2048): Total Chunks: 2, Chunks in use: 2. 4.0KiB allocated for chunks. 4.0KiB in use in bin. 4.0KiB client-requested in use in bin. 2021-08-17 11:51:22.784188: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4096): Total Chunks: 1, Chunks in use: 1. 5.0KiB allocated for chunks. 5.0KiB in use in bin. 5.0KiB client-requested in use in bin. 2021-08-17 11:51:22.784199: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-08-17 11:51:22.784211: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-08-17 11:51:22.784223: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-08-17 11:51:22.784235: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-08-17 11:51:22.784249: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-08-17 11:51:22.784273: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (262144): Total Chunks: 1, Chunks in use: 1. 400.0KiB allocated for chunks. 400.0KiB in use in bin. 400.0KiB client-requested in use in bin. 2021-08-17 11:51:22.784287: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (524288): Total Chunks: 1, Chunks in use: 1. 640.0KiB allocated for chunks. 640.0KiB in use in bin. 640.0KiB client-requested in use in bin. 2021-08-17 11:51:22.784298: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-08-17 11:51:22.784310: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-08-17 11:51:22.784321: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-08-17 11:51:22.784333: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-08-17 11:51:22.784347: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16777216): Total Chunks: 1, Chunks in use: 1. 26.24MiB allocated for chunks. 26.24MiB in use in bin. 26.24MiB client-requested in use in bin. 2021-08-17 11:51:22.784358: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-08-17 11:51:22.784370: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-08-17 11:51:22.784381: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-08-17 11:51:22.784394: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (268435456): Total Chunks: 1, Chunks in use: 0. 10.16GiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-08-17 11:51:22.784406: I tensorflow/core/common_runtime/bfc_allocator.cc:613] Bin for 12.01GiB was 256.00MiB, Chunk State: 2021-08-17 11:51:22.784423: I tensorflow/core/common_runtime/bfc_allocator.cc:619] Size: 10.16GiB | Requested Size: 0B | in_use: 0, prev: Size: 26.24MiB | Requested Size: 26.24MiB | in_use: 1 2021-08-17 11:51:22.784437: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da000000 of size 256 2021-08-17 11:51:22.784447: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da000100 of size 256 2021-08-17 11:51:22.784456: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da000200 of size 256 2021-08-17 11:51:22.784465: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da000300 of size 256 2021-08-17 11:51:22.784474: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da000400 of size 256 2021-08-17 11:51:22.784484: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da000500 of size 256 2021-08-17 11:51:22.784493: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da000600 of size 256 2021-08-17 11:51:22.784502: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da000700 of size 256 2021-08-17 11:51:22.784511: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da000800 of size 256 2021-08-17 11:51:22.784520: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da000900 of size 256 2021-08-17 11:51:22.784529: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da000a00 of size 256 2021-08-17 11:51:22.784538: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da000b00 of size 256 2021-08-17 11:51:22.784547: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da000c00 of size 2048 2021-08-17 11:51:22.784557: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da001400 of size 256 2021-08-17 11:51:22.784565: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da001500 of size 2048 2021-08-17 11:51:22.784574: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da001d00 of size 256 2021-08-17 11:51:22.784583: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da001e00 of size 256 2021-08-17 11:51:22.784592: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da001f00 of size 256 2021-08-17 11:51:22.784602: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da002000 of size 256 2021-08-17 11:51:22.784611: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da002100 of size 5120 2021-08-17 11:51:22.784620: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da003500 of size 256 2021-08-17 11:51:22.784630: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da003600 of size 256 2021-08-17 11:51:22.784639: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da003700 of size 1280 2021-08-17 11:51:22.784649: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da003c00 of size 409600 2021-08-17 11:51:22.784658: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da067c00 of size 655360 2021-08-17 11:51:22.784668: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb0da107c00 of size 27517440 2021-08-17 11:51:22.784678: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7fb0dbb45e00 of size 10904553472 2021-08-17 11:51:22.784687: I tensorflow/core/common_runtime/bfc_allocator.cc:638] Summary of in-use Chunks by size: 2021-08-17 11:51:22.784698: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 19 Chunks of size 256 totalling 4.8KiB 2021-08-17 11:51:22.784709: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 1280 totalling 1.2KiB 2021-08-17 11:51:22.784719: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 2048 totalling 4.0KiB 2021-08-17 11:51:22.784728: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 5120 totalling 5.0KiB 2021-08-17 11:51:22.784739: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 409600 totalling 400.0KiB 2021-08-17 11:51:22.784749: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 655360 totalling 640.0KiB 2021-08-17 11:51:22.784759: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 27517440 totalling 26.24MiB 2021-08-17 11:51:22.784770: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 27.27MiB 2021-08-17 11:51:22.784783: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats: Limit: 10933151335 InUse: 28597760 MaxInUse: 28597760 NumAllocs: 26 MaxAllocSize: 27517440

2021-08-17 11:51:22.784808: W tensorflow/core/common_runtime/bfc_allocator.cc:271] ___ 2021-08-17 11:51:22.784885: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at random_op.cc:202 : Resource exhausted: OOM when allocating tensor with shape[25190712,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Traceback (most recent call last): File "/opt/dl/anaconda3/envs/tf112/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(args) File "/opt/dl/anaconda3/envs/tf112/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/opt/dl/anaconda3/envs/tf112/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[25190712,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node model/NODES_VOCAB/Initializer/random_uniform/RandomUniform}} = RandomUniformT=DT_INT32, _class=["loc:@model/NODES_VOCAB/Assign"], dtype=DT_FLOAT, seed=239, seed2=1058, _device="/job:localhost/replica:0/task:0/device:GPU:0"]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.`

  1. This above trace happened with Tensorflow 1.12 than Tensorflow 1.14 with gpu capability on
  2. When I turn off the the GPU, the release command works and saves the release model.
urialon commented 3 years ago

Hi @Sohaib90 , It seems that you are running out of GPU memory. Which GPU are you using, and how much memory does it have? Can you share a screenshot of running: nvidia-smi ?

Sohaib90 commented 3 years ago

Hello @urialon

Yes I think so too. GPU memory is 10.92GB and I think it runs out of memory when loading a pretrained model. Specifications are:

name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582 totalMemory: 10.92GiB freeMemory: 10.78GiB

urialon commented 3 years ago

Can you share a screenshot of runningnvidia-smi:

  1. before, and
  2. while loading a model and before it crashes?
urialon commented 2 years ago

Closing due to inactivity