patrickbryant1 / SpeedPPI

Rapid protein-protein interaction network creation from multiple sequence alignments with Deep Learning
Other
65 stars 14 forks source link

Out of Memory in 2080 ti #1

Closed superantichrist closed 1 year ago

superantichrist commented 1 year ago

I ran all_to_all and passed pre1, pred2, but after that got memory error.

a_stack/triangle_multiplication_outgoing/gating_linear/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76 XLA Label: custom-call Shape: f32[128,1317904]

    Buffer 10:
            Size: 643.51MiB
            XLA Label: fusion
            Shape: f32[128,1148,1148]
            ==========================

    Buffer 11:
            Size: 643.51MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/broadcast_in_dim[shape=(1148, 1148, 128) broadcast_dimensions=()]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=352
            XLA Label: broadcast
            Shape: f32[1148,1148,128]
            ==========================

    Buffer 12:
            Size: 321.75MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/alphafold_iteration/distogram_head/add" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1372
            XLA Label: fusion
            Shape: f32[1148,1148,64]
            ==========================

    Buffer 13:
            Size: 246.64MiB
            Entry Parameter Subshape: f32[11,5120,1148]
            ==========================

    Buffer 14:
            Size: 246.64MiB
            Entry Parameter Subshape: s32[11,5120,1148]
            ==========================

    Buffer 15:
            Size: 246.64MiB
            Entry Parameter Subshape: f32[11,5120,1148]
            ==========================

The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.


The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "./src/run_alphafold_all_vs_all.py", line 306, in main(num_ensemble=1, File "./src/run_alphafold_all_vs_all.py", line 269, in main prediction_result = model_runner.predict(processed_feature_dict) File "/m2/SpeedPPI/src/alphafold/model/model.py", line 133, in predict result = self.apply(self.params, jax.random.PRNGKey(0), feat) jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 8130147888 bytes. BufferAssignment OOM Debugging. BufferAssignment stats: parameter allocation: 2.59GiB constant allocation: 38.6KiB maybe_live_out allocation: 373.95MiB preallocated temp allocation: 7.57GiB total allocation: 10.53GiB total fragmentation: 162.30MiB (1.51%) Peak buffers: Buffer 1: Size: 1.40GiB Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/outer_product_mean/layer_norm_input/jit(_var)/reduce_sum[axes=(2,)]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1446 XLA Label: fusion Shape: f32[5120,1148,64]

    Buffer 2:
            Size: 1.17GiB
            Entry Parameter Subshape: f32[11,508,1148,49]
            ==========================

    Buffer 3:
            Size: 643.51MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/mul" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1307
            XLA Label: fusion
            Shape: f32[128,1148,1148]
            ==========================

    Buffer 4:
            Size: 643.51MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/mul" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1307
            XLA Label: fusion
            Shape: f32[128,1148,1148]
            ==========================

    Buffer 5:
            Size: 643.51MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/left_projection/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1317904]
            ==========================

    Buffer 6:
            Size: 643.51MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/left_gate/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1317904]
            ==========================

    Buffer 7:
            Size: 643.51MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/right_projection/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1317904]
            ==========================

    Buffer 8:
            Size: 643.51MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/right_gate/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1317904]
            ==========================

    Buffer 9:
            Size: 643.51MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/gating_linear/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1317904]
            ==========================

    Buffer 10:
            Size: 643.51MiB
            XLA Label: fusion
            Shape: f32[128,1148,1148]
            ==========================

    Buffer 11:
            Size: 643.51MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/broadcast_in_dim[shape=(1148, 1148, 128) broadcast_dimensions=()]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=352
            XLA Label: broadcast
            Shape: f32[1148,1148,128]
            ==========================

    Buffer 12:
            Size: 321.75MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/alphafold_iteration/distogram_head/add" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1372
            XLA Label: fusion
            Shape: f32[1148,1148,64]
            ==========================

    Buffer 13:
            Size: 246.64MiB
            Entry Parameter Subshape: f32[11,5120,1148]
            ==========================

    Buffer 14:
            Size: 246.64MiB
            Entry Parameter Subshape: s32[11,5120,1148]
            ==========================

    Buffer 15:
            Size: 246.64MiB
            Entry Parameter Subshape: f32[11,5120,1148]
            ==========================

mkdir: cannot create directory ‘./data/dev/all_vs_all/pred3/’: File exists Running pred 3 out of 5 Evaluating pair 4IFD_C-4IFD_J 2023-04-20 16:17:37.870533: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 6688115632 exceeds 10% of free system memory. 2023-04-20 16:17:37.941510: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 6932018392 exceeds 10% of free system memory. 2023-04-20 16:17:43.933476: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 6932018392 exceeds 10% of free system memory. 2023-04-20 16:17:47.914388: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 6932018392 exceeds 10% of free system memory. 2023-04-20 16:17:51.906020: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 6932018392 exceeds 10% of free system memory. /m2/SpeedPPI/src/alphafold/model/mapping.py:49: FutureWarning: jax.tree_flatten is deprecated, and will be removed in a future release. Use jax.tree_util.tree_flatten instead. values_tree_def = jax.tree_flatten(values)[1] /m2/SpeedPPI/src/alphafold/model/mapping.py:53: FutureWarning: jax.tree_unflatten is deprecated, and will be removed in a future release. Use jax.tree_util.tree_unflatten instead. return jax.tree_unflatten(values_tree_def, flat_axes) /m2/SpeedPPI/src/alphafold/model/mapping.py:124: FutureWarning: jax.tree_flatten is deprecated, and will be removed in a future release. Use jax.tree_util.tree_flatten instead. flat_sizes = jax.tree_flatten(in_sizes)[0] 2023-04-20 16:20:11.643339: W external/org_tensorflow/tensorflow/tsl/framework/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 9.53GiB (rounded to 10228218624)requested by op 2023-04-20 16:20:11.643664: W external/org_tensorflow/tensorflow/tsl/framework/bfc_allocator.cc:492] *****___ 2023-04-20 16:20:11.649408: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 10228218416 bytes. BufferAssignment OOM Debugging. BufferAssignment stats: parameter allocation: 2.86GiB constant allocation: 38.7KiB maybe_live_out allocation: 462.23MiB preallocated temp allocation: 9.53GiB preallocated temp fragmentation: 784.48MiB (8.04%) total allocation: 12.83GiB total fragmentation: 1.17GiB (9.13%) Peak buffers: Buffer 1: Size: 1.57GiB Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/outer_product_mean/layer_norm_input/jit(_var)/reduce_sum[axes=(2,)]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1446 XLA Label: fusion Shape: f32[5120,1286,64]

    Buffer 2:
            Size: 1.31GiB
            Entry Parameter Subshape: f32[11,508,1286,49]
            ==========================

    Buffer 3:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/mul" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1307
            XLA Label: fusion
            Shape: f32[128,1286,1286]
            ==========================

    Buffer 4:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/mul" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1307
            XLA Label: fusion
            Shape: f32[128,1286,1286]
            ==========================

    Buffer 5:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/left_projection/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 6:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/left_gate/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 7:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/right_projection/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 8:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/right_gate/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 9:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/gating_linear/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 10:
            Size: 807.52MiB
            XLA Label: fusion
            Shape: f32[128,1286,1286]
            ==========================

    Buffer 11:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/broadcast_in_dim[shape=(1286, 1286, 128) broadcast_dimensions=()]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=352
            XLA Label: broadcast
            Shape: f32[1286,1286,128]
            ==========================

    Buffer 12:
            Size: 403.76MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/alphafold_iteration/distogram_head/add" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1372
            XLA Label: fusion
            Shape: f32[1286,1286,64]
            ==========================

    Buffer 13:
            Size: 276.29MiB
            Entry Parameter Subshape: f32[11,5120,1286]
            ==========================

    Buffer 14:
            Size: 276.29MiB
            Entry Parameter Subshape: s32[11,5120,1286]
            ==========================

    Buffer 15:
            Size: 276.29MiB
            Entry Parameter Subshape: f32[11,5120,1286]
            ==========================

Traceback (most recent call last): File "./src/run_alphafold_all_vs_all.py", line 306, in main(num_ensemble=1, File "./src/run_alphafold_all_vs_all.py", line 269, in main prediction_result = model_runner.predict(processed_feature_dict) File "/m2/SpeedPPI/src/alphafold/model/model.py", line 133, in predict result = self.apply(self.params, jax.random.PRNGKey(0), feat) File "/home/numu/anaconda3/envs/SpeedPPI/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, *kwargs) File "/home/numu/anaconda3/envs/SpeedPPI/lib/python3.8/site-packages/jax/_src/api.py", line 623, in cache_miss out_flat = call_bind_continuation(execute(args_flat)) File "/home/numu/anaconda3/envs/SpeedPPI/lib/python3.8/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled out_flat = compiled.execute(in_flat) jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 10228218416 bytes. BufferAssignment OOM Debugging. BufferAssignment stats: parameter allocation: 2.86GiB constant allocation: 38.7KiB maybe_live_out allocation: 462.23MiB preallocated temp allocation: 9.53GiB preallocated temp fragmentation: 784.48MiB (8.04%) total allocation: 12.83GiB total fragmentation: 1.17GiB (9.13%) Peak buffers: Buffer 1: Size: 1.57GiB Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/outer_product_mean/layer_norm_input/jit(_var)/reduce_sum[axes=(2,)]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1446 XLA Label: fusion Shape: f32[5120,1286,64]

    Buffer 2:
            Size: 1.31GiB
            Entry Parameter Subshape: f32[11,508,1286,49]
            ==========================

    Buffer 3:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/mul" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1307
            XLA Label: fusion
            Shape: f32[128,1286,1286]
            ==========================

    Buffer 4:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/mul" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1307
            XLA Label: fusion
            Shape: f32[128,1286,1286]
            ==========================

    Buffer 5:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/left_projection/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 6:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/left_gate/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 7:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/right_projection/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 8:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/right_gate/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 9:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/gating_linear/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 10:
            Size: 807.52MiB
            XLA Label: fusion
            Shape: f32[128,1286,1286]
            ==========================

    Buffer 11:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/broadcast_in_dim[shape=(1286, 1286, 128) broadcast_dimensions=()]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=352
            XLA Label: broadcast
            Shape: f32[1286,1286,128]
            ==========================

    Buffer 12:
            Size: 403.76MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/alphafold_iteration/distogram_head/add" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1372
            XLA Label: fusion
            Shape: f32[1286,1286,64]
            ==========================

    Buffer 13:
            Size: 276.29MiB
            Entry Parameter Subshape: f32[11,5120,1286]
            ==========================

    Buffer 14:
            Size: 276.29MiB
            Entry Parameter Subshape: s32[11,5120,1286]
            ==========================

    Buffer 15:
            Size: 276.29MiB
            Entry Parameter Subshape: f32[11,5120,1286]
            ==========================

The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.


The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "./src/run_alphafold_all_vs_all.py", line 306, in main(num_ensemble=1, File "./src/run_alphafold_all_vs_all.py", line 269, in main prediction_result = model_runner.predict(processed_feature_dict) File "/m2/SpeedPPI/src/alphafold/model/model.py", line 133, in predict result = self.apply(self.params, jax.random.PRNGKey(0), feat) jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 10228218416 bytes. BufferAssignment OOM Debugging. BufferAssignment stats: parameter allocation: 2.86GiB constant allocation: 38.7KiB maybe_live_out allocation: 462.23MiB preallocated temp allocation: 9.53GiB preallocated temp fragmentation: 784.48MiB (8.04%) total allocation: 12.83GiB total fragmentation: 1.17GiB (9.13%) Peak buffers: Buffer 1: Size: 1.57GiB Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/outer_product_mean/layer_norm_input/jit(_var)/reduce_sum[axes=(2,)]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1446 XLA Label: fusion Shape: f32[5120,1286,64]

    Buffer 2:
            Size: 1.31GiB
            Entry Parameter Subshape: f32[11,508,1286,49]
            ==========================

    Buffer 3:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/mul" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1307
            XLA Label: fusion
            Shape: f32[128,1286,1286]
            ==========================

    Buffer 4:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/mul" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1307
            XLA Label: fusion
            Shape: f32[128,1286,1286]
            ==========================

    Buffer 5:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/left_projection/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 6:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/left_gate/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 7:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/right_projection/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 8:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/right_gate/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 9:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/gating_linear/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1653796]
            ==========================

    Buffer 10:
            Size: 807.52MiB
            XLA Label: fusion
            Shape: f32[128,1286,1286]
            ==========================

    Buffer 11:
            Size: 807.52MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/broadcast_in_dim[shape=(1286, 1286, 128) broadcast_dimensions=()]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=352
            XLA Label: broadcast
            Shape: f32[1286,1286,128]
            ==========================

    Buffer 12:
            Size: 403.76MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/alphafold_iteration/distogram_head/add" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1372
            XLA Label: fusion
            Shape: f32[1286,1286,64]
            ==========================

    Buffer 13:
            Size: 276.29MiB
            Entry Parameter Subshape: f32[11,5120,1286]
            ==========================

    Buffer 14:
            Size: 276.29MiB
            Entry Parameter Subshape: s32[11,5120,1286]
            ==========================

    Buffer 15:
            Size: 276.29MiB
            Entry Parameter Subshape: f32[11,5120,1286]
            ==========================

mkdir: cannot create directory ‘./data/dev/all_vs_all/pred4/’: File exists Running pred 4 out of 5 Evaluating pair 4IFD_J-4IFD_A 2023-04-20 16:21:01.340203: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 9673627392 exceeds 10% of free system memory. 2023-04-20 16:21:01.406058: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 10055011200 exceeds 10% of free system memory. 2023-04-20 16:21:33.834163: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 10055011200 exceeds 10% of free system memory. 2023-04-20 16:21:39.553369: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 10055011200 exceeds 10% of free system memory. 2023-04-20 16:21:45.237304: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 10055011200 exceeds 10% of free system memory. /m2/SpeedPPI/src/alphafold/model/mapping.py:49: FutureWarning: jax.tree_flatten is deprecated, and will be removed in a future release. Use jax.tree_util.tree_flatten instead. values_tree_def = jax.tree_flatten(values)[1] /m2/SpeedPPI/src/alphafold/model/mapping.py:53: FutureWarning: jax.tree_unflatten is deprecated, and will be removed in a future release. Use jax.tree_util.tree_unflatten instead. return jax.tree_unflatten(values_tree_def, flat_axes) /m2/SpeedPPI/src/alphafold/model/mapping.py:124: FutureWarning: jax.tree_flatten is deprecated, and will be removed in a future release. Use jax.tree_util.tree_flatten instead. flat_sizes = jax.tree_flatten(in_sizes)[0] 2023-04-20 16:24:19.842373: W external/org_tensorflow/tensorflow/tsl/framework/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 8.41GiB (rounded to 9031036416)requested by op 2023-04-20 16:24:19.844742: W external/org_tensorflow/tensorflow/tsl/framework/bfc_allocator.cc:492] ****____ 2023-04-20 16:24:19.850298: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 9031036208 bytes. BufferAssignment OOM Debugging. BufferAssignment stats: parameter allocation: 2.78GiB constant allocation: 38.6KiB maybe_live_out allocation: 436.99MiB preallocated temp allocation: 8.41GiB total allocation: 11.62GiB total fragmentation: 324.96MiB (2.73%) Peak buffers: Buffer 1: Size: 1.52GiB Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/outer_product_mean/layer_norm_input/jit(_var)/reduce_sum[axes=(2,)]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1446 XLA Label: fusion Shape: f32[5120,1248,64]

    Buffer 2:
            Size: 1.27GiB
            Entry Parameter Subshape: f32[11,508,1248,49]
            ==========================

    Buffer 3:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/mul" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1307
            XLA Label: fusion
            Shape: f32[128,1248,1248]
            ==========================

    Buffer 4:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/mul" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1307
            XLA Label: fusion
            Shape: f32[128,1248,1248]
            ==========================

    Buffer 5:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/left_projection/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 6:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/left_gate/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 7:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/right_projection/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 8:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/right_gate/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 9:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/gating_linear/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 10:
            Size: 760.50MiB
            XLA Label: fusion
            Shape: f32[128,1248,1248]
            ==========================

    Buffer 11:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/broadcast_in_dim[shape=(1248, 1248, 128) broadcast_dimensions=()]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=352
            XLA Label: broadcast
            Shape: f32[1248,1248,128]
            ==========================

    Buffer 12:
            Size: 380.25MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/alphafold_iteration/distogram_head/add" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1372
            XLA Label: fusion
            Shape: f32[1248,1248,64]
            ==========================

    Buffer 13:
            Size: 268.12MiB
            Entry Parameter Subshape: f32[11,5120,1248]
            ==========================

    Buffer 14:
            Size: 268.12MiB
            Entry Parameter Subshape: s32[11,5120,1248]
            ==========================

    Buffer 15:
            Size: 268.12MiB
            Entry Parameter Subshape: f32[11,5120,1248]
            ==========================

Traceback (most recent call last): File "./src/run_alphafold_all_vs_all.py", line 306, in main(num_ensemble=1, File "./src/run_alphafold_all_vs_all.py", line 269, in main prediction_result = model_runner.predict(processed_feature_dict) File "/m2/SpeedPPI/src/alphafold/model/model.py", line 133, in predict result = self.apply(self.params, jax.random.PRNGKey(0), feat) File "/home/numu/anaconda3/envs/SpeedPPI/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, *kwargs) File "/home/numu/anaconda3/envs/SpeedPPI/lib/python3.8/site-packages/jax/_src/api.py", line 623, in cache_miss out_flat = call_bind_continuation(execute(args_flat)) File "/home/numu/anaconda3/envs/SpeedPPI/lib/python3.8/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled out_flat = compiled.execute(in_flat) jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 9031036208 bytes. BufferAssignment OOM Debugging. BufferAssignment stats: parameter allocation: 2.78GiB constant allocation: 38.6KiB maybe_live_out allocation: 436.99MiB preallocated temp allocation: 8.41GiB total allocation: 11.62GiB total fragmentation: 324.96MiB (2.73%) Peak buffers: Buffer 1: Size: 1.52GiB Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/outer_product_mean/layer_norm_input/jit(_var)/reduce_sum[axes=(2,)]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1446 XLA Label: fusion Shape: f32[5120,1248,64]

    Buffer 2:
            Size: 1.27GiB
            Entry Parameter Subshape: f32[11,508,1248,49]
            ==========================

    Buffer 3:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/mul" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1307
            XLA Label: fusion
            Shape: f32[128,1248,1248]
            ==========================

    Buffer 4:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/mul" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1307
            XLA Label: fusion
            Shape: f32[128,1248,1248]
            ==========================

    Buffer 5:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/left_projection/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 6:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/left_gate/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 7:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/right_projection/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 8:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/right_gate/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 9:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/gating_linear/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 10:
            Size: 760.50MiB
            XLA Label: fusion
            Shape: f32[128,1248,1248]
            ==========================

    Buffer 11:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/broadcast_in_dim[shape=(1248, 1248, 128) broadcast_dimensions=()]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=352
            XLA Label: broadcast
            Shape: f32[1248,1248,128]
            ==========================

    Buffer 12:
            Size: 380.25MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/alphafold_iteration/distogram_head/add" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1372
            XLA Label: fusion
            Shape: f32[1248,1248,64]
            ==========================

    Buffer 13:
            Size: 268.12MiB
            Entry Parameter Subshape: f32[11,5120,1248]
            ==========================

    Buffer 14:
            Size: 268.12MiB
            Entry Parameter Subshape: s32[11,5120,1248]
            ==========================

    Buffer 15:
            Size: 268.12MiB
            Entry Parameter Subshape: f32[11,5120,1248]
            ==========================

The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.


The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "./src/run_alphafold_all_vs_all.py", line 306, in main(num_ensemble=1, File "./src/run_alphafold_all_vs_all.py", line 269, in main prediction_result = model_runner.predict(processed_feature_dict) File "/m2/SpeedPPI/src/alphafold/model/model.py", line 133, in predict result = self.apply(self.params, jax.random.PRNGKey(0), feat) jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 9031036208 bytes. BufferAssignment OOM Debugging. BufferAssignment stats: parameter allocation: 2.78GiB constant allocation: 38.6KiB maybe_live_out allocation: 436.99MiB preallocated temp allocation: 8.41GiB total allocation: 11.62GiB total fragmentation: 324.96MiB (2.73%) Peak buffers: Buffer 1: Size: 1.52GiB Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/outer_product_mean/layer_norm_input/jit(_var)/reduce_sum[axes=(2,)]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1446 XLA Label: fusion Shape: f32[5120,1248,64]

    Buffer 2:
            Size: 1.27GiB
            Entry Parameter Subshape: f32[11,508,1248,49]
            ==========================

    Buffer 3:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/mul" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1307
            XLA Label: fusion
            Shape: f32[128,1248,1248]
            ==========================

    Buffer 4:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/mul" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1307
            XLA Label: fusion
            Shape: f32[128,1248,1248]
            ==========================

    Buffer 5:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/left_projection/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 6:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/left_gate/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 7:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/right_projection/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 8:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/right_gate/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 9:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/while/body/alphafold_iteration/evoformer/__layer_stack_no_state/while/body/extra_msa_stack/triangle_multiplication_outgoing/gating_linear/...cb,cd->...db/jit(_einsum)/dot_general[dimension_numbers=(((0,), (1,)), ((), ())) precision=None preferred_element_type=None]" source_file="/m2/SpeedPPI/src/alphafold/model/common_modules.py" source_line=76
            XLA Label: custom-call
            Shape: f32[128,1557504]
            ==========================

    Buffer 10:
            Size: 760.50MiB
            XLA Label: fusion
            Shape: f32[128,1248,1248]
            ==========================

    Buffer 11:
            Size: 760.50MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/broadcast_in_dim[shape=(1248, 1248, 128) broadcast_dimensions=()]" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=352
            XLA Label: broadcast
            Shape: f32[1248,1248,128]
            ==========================

    Buffer 12:
            Size: 380.25MiB
            Operator: op_name="jit(apply_fn)/jit(main)/alphafold/alphafold_iteration/distogram_head/add" source_file="/m2/SpeedPPI/src/alphafold/model/modules.py" source_line=1372
            XLA Label: fusion
            Shape: f32[1248,1248,64]
            ==========================

    Buffer 13:
            Size: 268.12MiB
            Entry Parameter Subshape: f32[11,5120,1248]
            ==========================

    Buffer 14:
            Size: 268.12MiB
            Entry Parameter Subshape: s32[11,5120,1248]
            ==========================

    Buffer 15:
            Size: 268.12MiB
            Entry Parameter Subshape: f32[11,5120,1248]
            ==========================

mkdir: cannot create directory ‘./data/dev/all_vs_all/pred5/’: File exists Running pred 5 out of 5 Saved all PPIs before filtering on pDockQ to ./data/dev/all_vs_all/all_ppis_unfiltered.csv Filtered PPI network on pDockQ> 0.5 resulting in 1 interactions. Saved all PPIs after filtering on pDockQ to ./data/dev/all_vs_all/ppis_filtered.csv mkdir: cannot create directory ‘./data/dev/all_vs_all/high_confidence_preds/’: File exists Moved all high confidence predictions to ./data/dev/all_vs_all/high_confidence_preds/

patrickbryant1 commented 1 year ago

Hi, this is related to your infrastructure. Some of your sequences don't fit on your GPU which causes this OOM. Try to figure out which ones these are and rerun the predictions without them or try to obtain a GPU with more RAM.