Closed JonathanALevine closed 4 months ago
@JonathanALevine Do you see the outputs and info statement printed up to the 1000? i.e.
print(f"PyBUDA forward pass ..... {i}")
@milank94 Yes, outputs and infos get printed even up to 10000. See here:
PyBUDA forward pass ..... 9998
2024-03-04 14:07:34.364 | DEBUG | pybuda.run.impl:_run_forward:641 - Running concurrent device forward: TTDevice 'tt0'
2024-03-04 14:07:34.364 | DEBUG | pybuda.device:run_next_command:426 - Received RUN_FORWARD command on TTDevice 'tt0' / 39342
2024-03-04 14:07:34.364 | DEBUG | pybuda.ttdevice:forward:862 - Starting forward on TTDevice 'tt0'
2024-03-04 14:07:34.365 | DEBUG | pybuda.backend:feeder_thread_main:142 - Run feeder thread cmd: fwd
2024-03-04 14:07:34.365 | INFO | Runtime - Running program 'run_fwd_0' with params [("$p_loop_count", "1")]
2024-03-04 14:07:34.365 | DEBUG | pybuda.backend:read_queues:316 - Reading output queue direct_pt.output_add_5
2024-03-04 14:07:34.365 | DEBUG | pybuda.device_connector:pusher_thread_main:159 - Pusher thread pushing tensors
2024-03-04 14:07:34.366 | DEBUG | pybuda.backend:push_to_queues:407 - Pushing to queue input_1
2024-03-04 14:07:34.367 | DEBUG | pybuda.backend:read_queues:376 - Done reading queues
2024-03-04 14:07:34.367 | DEBUG | pybuda.backend:pop_queues:382 - Popping from queue direct_pt.output_add_5
[PyBuda Tensor: tensor([[-0.3359]], dtype=torch.bfloat16, requires_grad=True), DataFormat.Float16_b]
tensor([[0.9505, 0.7036, 0.0594, ..., 0.4409, 0.2454, 0.0907]])
PyBUDA forward pass ..... 9999
2024-03-04 14:07:34.368 | DEBUG | pybuda.run.impl:_run_forward:641 - Running concurrent device forward: TTDevice 'tt0'
2024-03-04 14:07:34.369 | DEBUG | pybuda.device:run_next_command:426 - Received RUN_FORWARD command on TTDevice 'tt0' / 39342
2024-03-04 14:07:34.369 | DEBUG | pybuda.ttdevice:forward:862 - Starting forward on TTDevice 'tt0'
2024-03-04 14:07:34.369 | DEBUG | pybuda.backend:feeder_thread_main:142 - Run feeder thread cmd: fwd
2024-03-04 14:07:34.369 | INFO | Runtime - Running program 'run_fwd_0' with params [("$p_loop_count", "1")]
2024-03-04 14:07:34.369 | DEBUG | pybuda.backend:read_queues:316 - Reading output queue direct_pt.output_add_5
2024-03-04 14:07:34.370 | DEBUG | pybuda.device_connector:pusher_thread_main:159 - Pusher thread pushing tensors
2024-03-04 14:07:34.370 | DEBUG | pybuda.backend:push_to_queues:407 - Pushing to queue input_1
2024-03-04 14:07:34.371 | DEBUG | pybuda.backend:read_queues:376 - Done reading queues
2024-03-04 14:07:34.371 | DEBUG | pybuda.backend:pop_queues:382 - Popping from queue direct_pt.output_add_5
[PyBuda Tensor: tensor([[-0.3320]], dtype=torch.bfloat16, requires_grad=True), DataFormat.Float16_b]
2024-03-04 14:07:34.374 | DEBUG | pybuda.run.impl:_shutdown:1262 - PyBuda shutdown
2024-03-04 14:07:34.374 | DEBUG | pybuda.device:run_next_command:416 - Received SHUTDOWN command on TTDevice 'tt0'
2024-03-04 14:07:34.374 | DEBUG | pybuda.device:run_next_command:419 - Waiting for barrier on TTDevice 'tt0'
2024-03-04 14:07:34.374 | DEBUG | pybuda.device:run_next_command:421 - Shutting down on TTDevice 'tt0'
2024-03-04 14:07:34.374 | DEBUG | pybuda.run.impl:_shutdown:1278 - Waiting until processes done
2024-03-04 14:07:34.375 | DEBUG | pybuda.backend:feeder_thread_main:142 - Run feeder thread cmd: quit
2024-03-04 14:07:34.381 | INFO | Runtime - Waiting for cluster completion
2024-03-04 14:07:34.381 | INFO | PerfPostProcess - Writing the host postprocess report in /tmp/jonathan/cea3632bcf87/perf_results//host/device_alignment_th_3282791031_proc_39342.json
2024-03-04 14:07:34.404 | INFO | Runtime - Closed all devices successfully
2024-03-04 14:07:34.404 | INFO | PerfCheck - Starting performance check for host events
2024-03-04 14:07:34.404 | INFO | PerfCheck - Finished host performance check successfully
2024-03-04 14:07:34.404 | INFO | Debuda - Debug server ended on
2024-03-04 14:07:34.405 | INFO | Always - finish_child_process called on pid 39342
But it might be the way I am running the forward pass. I am reading these docs now: https://docs.tenstorrent.com/tenstorrent/v/tt-buda/pybuda/first_model
Ok awesome -- so that is all normal!
Your program finished and the pybuda_shutdown()
sequence was initiated to close the connection.
If you want to keep the program active i.e. waiting for new inputs, then you either need to setup some sort of continuous loop that waits for new input i.e. While True:
or you can use a serving backend like FastAPI (see example here: https://github.com/tenstorrent/tt-buda-demos/blob/main/first_5_steps/5_serving_tt_models.ipynb)
But why would pybuda_shutdown()
sequence be much longer when its=10000
then when its=1000
?
For example when its=1000
, the program exits:
PyBUDA forward pass ..... 999
2024-03-04 14:29:24.798 | DEBUG | pybuda.run.impl:_run_forward:641 - Running concurrent device forward: TTDevice 'tt0'
2024-03-04 14:29:24.799 | DEBUG | pybuda.device:run_next_command:426 - Received RUN_FORWARD command on TTDevice 'tt0' / 52758
2024-03-04 14:29:24.799 | DEBUG | pybuda.ttdevice:forward:862 - Starting forward on TTDevice 'tt0'
2024-03-04 14:29:24.799 | DEBUG | pybuda.backend:feeder_thread_main:142 - Run feeder thread cmd: fwd
2024-03-04 14:29:24.799 | INFO | Runtime - Running program 'run_fwd_0' with params [("$p_loop_count", "1")]
2024-03-04 14:29:24.799 | DEBUG | pybuda.backend:read_queues:316 - Reading output queue direct_pt.output_add_5
2024-03-04 14:29:24.800 | DEBUG | pybuda.device_connector:pusher_thread_main:159 - Pusher thread pushing tensors
2024-03-04 14:29:24.800 | DEBUG | pybuda.backend:push_to_queues:407 - Pushing to queue input_1
2024-03-04 14:29:24.801 | DEBUG | pybuda.backend:read_queues:376 - Done reading queues
2024-03-04 14:29:24.801 | DEBUG | pybuda.backend:pop_queues:382 - Popping from queue direct_pt.output_add_5
[PyBuda Tensor: tensor([[-0.3320]], dtype=torch.bfloat16, requires_grad=True), DataFormat.Float16_b]
2024-03-04 14:29:24.802 | DEBUG | pybuda.run.impl:_shutdown:1262 - PyBuda shutdown
2024-03-04 14:29:24.803 | DEBUG | pybuda.device:run_next_command:416 - Received SHUTDOWN command on TTDevice 'tt0'
2024-03-04 14:29:24.803 | DEBUG | pybuda.device:run_next_command:419 - Waiting for barrier on TTDevice 'tt0'
2024-03-04 14:29:24.803 | DEBUG | pybuda.device:run_next_command:421 - Shutting down on TTDevice 'tt0'
2024-03-04 14:29:24.803 | DEBUG | pybuda.run.impl:_shutdown:1278 - Waiting until processes done
2024-03-04 14:29:24.804 | DEBUG | pybuda.backend:feeder_thread_main:142 - Run feeder thread cmd: quit
2024-03-04 14:29:24.811 | INFO | Runtime - Waiting for cluster completion
2024-03-04 14:29:24.812 | INFO | PerfPostProcess - Writing the host postprocess report in /tmp/jonathan/cea3632bcf87/perf_results//host/device_alignment_th_3424747444_proc_52758.json
2024-03-04 14:29:24.855 | INFO | Runtime - Closed all devices successfully
2024-03-04 14:29:24.855 | INFO | PerfCheck - Starting performance check for host events
2024-03-04 14:29:24.855 | INFO | PerfCheck - Finished host performance check successfully
2024-03-04 14:29:24.855 | INFO | Debuda - Debug server ended on
2024-03-04 14:29:24.855 | INFO | Always - finish_child_process called on pid 52758
2024-03-04 14:29:24.856 | DEBUG | pybuda.device:atexit_handler:919 - atexit handler called for (TTDevice 'tt0',)
2024-03-04 14:29:24.856 | INFO | Always - finish_child_process called on pid 52758
2024-03-04 14:29:24.856 | DEBUG | pybuda.device:atexit_handler:923 - atexit handler completed
2024-03-04 14:29:25.527 | INFO | Always - finish_child_process called on pid 52709
but when its=10000
, the end shutdown hangs:
PyBUDA forward pass ..... 9999
2024-03-04 14:31:29.522 | DEBUG | pybuda.run.impl:_run_forward:641 - Running concurrent device forward: TTDevice 'tt0'
2024-03-04 14:31:29.522 | DEBUG | pybuda.device:run_next_command:426 - Received RUN_FORWARD command on TTDevice 'tt0' / 54932
2024-03-04 14:31:29.522 | DEBUG | pybuda.ttdevice:forward:862 - Starting forward on TTDevice 'tt0'
2024-03-04 14:31:29.523 | DEBUG | pybuda.backend:feeder_thread_main:142 - Run feeder thread cmd: fwd
2024-03-04 14:31:29.523 | INFO | Runtime - Running program 'run_fwd_0' with params [("$p_loop_count", "1")]
2024-03-04 14:31:29.523 | DEBUG | pybuda.backend:read_queues:316 - Reading output queue direct_pt.output_add_5
2024-03-04 14:31:29.523 | DEBUG | pybuda.device_connector:pusher_thread_main:159 - Pusher thread pushing tensors
2024-03-04 14:31:29.524 | DEBUG | pybuda.backend:push_to_queues:407 - Pushing to queue input_1
2024-03-04 14:31:29.525 | DEBUG | pybuda.backend:read_queues:376 - Done reading queues
2024-03-04 14:31:29.525 | DEBUG | pybuda.backend:pop_queues:382 - Popping from queue direct_pt.output_add_5
[PyBuda Tensor: tensor([[-0.3320]], dtype=torch.bfloat16, requires_grad=True), DataFormat.Float16_b]
2024-03-04 14:31:29.527 | DEBUG | pybuda.run.impl:_shutdown:1262 - PyBuda shutdown
2024-03-04 14:31:29.528 | DEBUG | pybuda.device:run_next_command:416 - Received SHUTDOWN command on TTDevice 'tt0'
2024-03-04 14:31:29.528 | DEBUG | pybuda.device:run_next_command:419 - Waiting for barrier on TTDevice 'tt0'
2024-03-04 14:31:29.528 | DEBUG | pybuda.device:run_next_command:421 - Shutting down on TTDevice 'tt0'
2024-03-04 14:31:29.528 | DEBUG | pybuda.run.impl:_shutdown:1278 - Waiting until processes done
2024-03-04 14:31:29.528 | DEBUG | pybuda.backend:feeder_thread_main:142 - Run feeder thread cmd: quit
2024-03-04 14:31:29.529 | INFO | Runtime - Waiting for cluster completion
2024-03-04 14:31:29.530 | INFO | PerfPostProcess - Writing the host postprocess report in /tmp/jonathan/cea3632bcf87/perf_results//host/device_alignment_th_3671482638_proc_54932.json
2024-03-04 14:31:29.576 | INFO | Runtime - Closed all devices successfully
2024-03-04 14:31:29.576 | INFO | PerfCheck - Starting performance check for host events
2024-03-04 14:31:29.576 | INFO | PerfCheck - Finished host performance check successfully
2024-03-04 14:31:29.576 | INFO | Debuda - Debug server ended on
2024-03-04 14:31:29.577 | INFO | Always - finish_child_process called on pid 54932
But why would
pybuda_shutdown()
sequence be much longer whenits=10000
then whenits=1000
?
My apologies, I miss read the 10,000..
Can you show a snippet of what you see when it is 10,000?
No worries! I added extra infos. See above.
I see, you're missing the last two Debug statements:
DEBUG | pybuda.device:atexit_handler:919 - atexit handler called for (TTDevice 'tt0',)
INFO | Always - finish_child_process called on pid
DEBUG | pybuda.device:atexit_handler:923 - atexit handler completed
INFO | Always - finish_child_process called on pid
Two questions to help debug:
- Does it eventually fully shutdown like the 1000 sample case or is it it forever stuck there?
Running it again, I will let it be for a while (maybe an hour?) and see if the card shuts down on its own.
- If you manually kill the process after it's stuck there, is the card in a bad state i.e. do you need to do a reset?
If I manually kill the process with 'CTRL+C' in terminal, the card seems to be fine. I can rerun the script no problem with its=1000
.
Addtionally I will change how myNet()
is defined to only use PyBUDA and no PyTorch and see if that makes any difference.
- Does it eventually fully shutdown like the 1000 sample case or is it it forever stuck there?
Running it again, I will let it be for a while (maybe an hour?) and see if the card shuts down on its own.
Let it attempt to shutdown on its for over an hour. The card never shutdown.
- If you manually kill the process after it's stuck there, is the card in a bad state i.e. do you need to do a reset?
If I manually kill the process with 'CTRL+C' in terminal, the card seems to be fine. I can rerun the script no problem with
its=1000
.Addtionally I will change how
myNet()
is defined to only use PyBUDA and no PyTorch and see if that makes any difference.
Running inference multiple times like this: https://github.com/JonathanALevine/tt-buda/blob/main/inference_stress_tester.py seems to run as expected.
For context this code is run on a tenstorrent grayskull e150.
If I run this
Everything runs as expected. But If I run this again with
its=10000
, the card seems to hang at the end. I get stuck with these info messages:myNet is defined as: