microsoft / vscode-jupyter

VS Code Jupyter extension
https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter
MIT License
1.3k stars 294 forks source link

Kernel crashed while executing #10827

Closed n1ghtf4l1 closed 2 years ago

n1ghtf4l1 commented 2 years ago

Environment data

Expected behaviour

It is expected that when we the following code the output should display this: image

Actual behaviour

It is showing this in the output on the jupyter when run:

The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info. View Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details.
Canceled future for execute_request message before replies were done  

Steps to reproduce:

  1. Run the following program on jupyter notebook:
    
    import torch
    import torch.nn as nn
    import torch.optim as optim
    import torch.nn.functional as F
    import torchvision.transforms as T
    from IPython import display

import matplotlib.pyplot as plt import gym import numpy as np

class CartPoleEnvManager: def init(self, device) -> None: """ (UPDATE!) we're no longer dealing with screen frames to determine the environment states, the class no longer has a current_screen attribute. Instead, we now have a current_state attribute, which is initialized to None. """ self.device = device

to access the tools that

    self.env = gym.make('CartPole-v1').unwrapped
    # we won't access in the normal intialization
    self.env.reset()  # reset the environment to get the initial observations
    self.current_screen = None  # No longer available
    self.current_state = None
    self.done = False

def reset(self) -> None:
    """
    (UPDATE!)
     Function now resets its `current_state` to the initial observation returned
      by the Gym environment when its reset, rather than resetting the screen
      as was implemented previously.
    """
    self.current_state = self.env.reset()

def close(self) -> None:
    self.env.close()

def render(self, mode='human'):
    return self.env.render(mode)

def display(self):
    return self.env.display()

def num_actions_available(self):
    """checking number of actions available in the environment –
       we have two options [<left, right>]
    """
    return self.env.action_space.n

def take_action(self, action: torch.Tensor) -> torch.Tensor:
    """
    (UPDATE!)
      We would not store the state returned by `env.step()` since we were not
       making use of it. Now, since we're using the returned state, rather than
       the screen frames, we store the state returned by `step()`
       in the class's `current_state` attribute.
    """
    # _, reward, self.done, _ = self.env.step(action.item()) # .item() is refer to our element inside the tensor

    self.current_state, reward, self.done, _ = self.env.step(action.item())
    return torch.tensor([reward], device=self.device)

def get_state(self):
    """
    (UPDATE!)
      Previously, this function would return the pixel data that resulted from
       the difference of the last two screen frames rendered in the environment.
      Now, we simply return the state of the environment that we've stored in
      the class's current_state attribute. 
    """
    if self.done:
        return torch.zeros_like(
            torch.tensor(self.current_state, device=self.device)
        ).float()
    else:
        return torch.tensor(self.current_state, device=self.device).float()

def num_state_features(self):
    """
    (NEW!)
      Returns the number of features included in a state returned by the Gym environment.
      This is so that we can know the size of states that will be passed to the
      network as input. 
    """
    return self.env.observation_space.shape[0]
# ------------------------------------------------------------------------------
"""
(UPDATE!)
 We no longer need any of the functions that we previously used for 
 screen processing, and so they can all be deleted. These include:
  - get_screen_height()
  - get_screen_width()
  - get_processed_screen()
  - crop_screen()
  - transform_screen_data()
  - just_starting()
"""

def image_reset(self) -> None:
    self.env.reset()
    self.current_screen = None  # for resetting the screen to the first screen

def get_image_state(self):
    """Return the current state of the environment in the form of
       a processed image of the screen.
    """
    if self.just_starting() or self.done:
        # if we're at the start of at the end of the proccess, we initialize
        # the image as a black screen.
        self.current_screen = self.get_processed_screen()
        black_screen = torch.zeros_like(self.current_screen)
        return black_screen
    else:
        # current screen
        s1 = self.current_screen
        # call a new screen (the next screen)
        s2 = self.get_processed_screen()
        # make the next screen is the current screen
        self.current_screen = s2
        # return the difference between two screens to get the current state
        return s2 - s1

def just_starting(self) -> bool:
    """Check whether the the current screen is None or not
        This help to check if we're at the start of the environment or not!
    """
    return self.current_screen is None

def get_screen_height(self) -> int:
    screen = self.get_processed_screen()
    return screen.shape[2]

def get_screen_width(self) -> int:
    screen = self.get_processed_screen()
    return screen.shape[3]

def get_processed_screen(self):
    """Get the image's color channels then transpose the channels
    into the order of channels by height and width which what PyTorch DQN expect  
    """
    screen = self.render('rgb_array').transpose((2, 0, 1))
    screen = self.crop_screen(screen)
    return self.transform_screen_data(screen)

def crop_screen(self, screen):
    """Except screen and will return a cropped version of it"""
    screen_height = screen.shape[1]

    # strip off top and bottom
    top = int(screen_height * 0.4)
    bottom = int(screen_height * 0.8)
    # we cropped 40% of the top of the screen & 20% of the bottom of the screen
    screen = screen[:, top:bottom, :]

    return screen

def transform_screen_data(self, screen):
    # Convert to float, rescale, convert to tensor
    screen = np.ascontiguousarray(screen,  # This array return contiguous array of the same as screen but transformed
                                  dtype=np.float32) / 255  # these values here will be stored sequencially in the memory

    screen = torch.from_numpy(screen)

    # Use `torchvision` package to compose several image transformations
    resize = T.Compose([
                       T.ToPILImage(),  # Convert the image to pill image
                       T.Resize((40, 90)),  # resize the image to 40x90
                       T.ToTensor()  # convert it to tensor
                       ])
    # Unsqueeze should add another dimension which represents the batch dim
    ## since the processed images will be passed to the dqn in batches.
    return resize(screen).unsqueeze(0).to(self.device)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") image_em = CartPoleEnvManager(device) image_em.image_reset() screen = image_em.render('rgb_array')

plt.figure() plt.imshow(screen) plt.title('None-processed Screen Example') plt.show()

I have nvcc version 11.7; Pytorch version 1.12.0(stable); openai gym version 0.7.4

## Jupyter Notebook Logs

<details>

<summary>Output for <code>Jupyter</code> in the <code>Output</code> panel (<code>View</code>→<code>Output</code>, change the drop-down the upper-right of the <code>Output</code> panel to <code>Jupyter</code>)
</summary>

<p>

info 9:48:10.896: Execute Cell 9 /home/arch/Workspace/lunar_lander/lunar_lander.ipynb info 9:48:10.899: Starting Jupyter Session id = 'startUsingPythonInterpreter:.jvsc74a57bd01aca715c408833f015ec2c50005d8b9b465033a7eca2c0a60c2042d35f67a26e./home/arch/miniconda3/envs/reinforcement-learning/python./home/arch/miniconda3/envs/reinforcement-learning/python.-m#ipykernel_launcher' (Python Path: /home/arch/miniconda3/envs/reinforcement-learning, EnvType: Conda, EnvName: 'reinforcement-learning', Version: 3.8.13) for '/home/arch/Workspace/lunar_lander/lunar_lander.ipynb' (disableUI=false) [I 09:48:10.950 NotebookApp] Creating new notebook in /lunar_lander [I 09:48:11.003 NotebookApp] Creating new notebook in info 9:48:11.23: installMissingDependencies /home/arch/miniconda3/envs/reinforcement-learning/bin/python, ui.disabled=false for resource '/home/arch/Workspace/lunar_lander/lunar_lander.ipynb' info 9:48:11.64: Process Execution: > ~/miniconda3/envs/reinforcement-learning/bin/python -c "import ipykernel"

~/miniconda3/envs/reinforcement-learning/bin/python -c "import ipykernel" info 9:48:11.249: Spec argv[0] updated from '/home/arch/miniconda3/envs/reinforcement-learning/bin/python' to '/home/arch/miniconda3/envs/reinforcement-learning/bin/python' info 9:48:11.456: Registering dummy command feature error 9:48:11.721: Failed to change kernel, re-throwing [Error]: at new r (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:27924) at new (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:2:1918192) at /home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:24:140193 at processTicksAndRejections (node:internal/process/task_queues:96:5)

FetchError: request to http://localhost:8888/api/sessions?1657858691713 failed, reason: getaddrinfo ENOTFOUND localhost at ClientRequest. (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:378131) at ClientRequest.emit (node:events:390:28) at ClientRequest.emit (node:domain:475:12) at Socket.socketErrorListener (node:_http_client:447:9) at Socket.emit (node:events:390:28) at Socket.emit (node:domain:475:12) at emitErrorNT (node:internal/streams/destroy:157:8) at emitErrorCloseNT (node:internal/streams/destroy:122:3) at processTicksAndRejections (node:internal/process/task_queues:83:21) { category: 'unknown', originalException: FetchError: request to http://localhost:8888/api/sessions?1657858691713 failed, reason: getaddrinfo ENOTFOUND localhost at ClientRequest. (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:378131) at ClientRequest.emit (node:events:390:28) at ClientRequest.emit (node:domain:475:12) at Socket.socketErrorListener (node:_http_client:447:9) at Socket.emit (node:events:390:28) at Socket.emit (node:domain:475:12) at emitErrorNT (node:internal/streams/destroy:157:8) at emitErrorCloseNT (node:internal/streams/destroy:122:3) at processTicksAndRejections (node:internal/process/taskqueues:83:21) } warn 9:48:11.723: Error occurred while trying to start the kernel, options.disableUI=false [Error]: at new r (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:27924) at new _ (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:2:1918192) at /home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:24:140193 at processTicksAndRejections (node:internal/process/task_queues:96:5)

FetchError: request to http://localhost:8888/api/sessions?1657858691713 failed, reason: getaddrinfo ENOTFOUND localhost at ClientRequest. (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:378131) at ClientRequest.emit (node:events:390:28) at ClientRequest.emit (node:domain:475:12) at Socket.socketErrorListener (node:_http_client:447:9) at Socket.emit (node:events:390:28) at Socket.emit (node:domain:475:12) at emitErrorNT (node:internal/streams/destroy:157:8) at emitErrorCloseNT (node:internal/streams/destroy:122:3) at processTicksAndRejections (node:internal/process/task_queues:83:21) { category: 'unknown', originalException: FetchError: request to http://localhost:8888/api/sessions?1657858691713 failed, reason: getaddrinfo ENOTFOUND localhost at ClientRequest. (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:378131) at ClientRequest.emit (node:events:390:28) at ClientRequest.emit (node:domain:475:12) at Socket.socketErrorListener (node:_http_client:447:9) at Socket.emit (node:events:390:28) at Socket.emit (node:domain:475:12) at emitErrorNT (node:internal/streams/destroy:157:8) at emitErrorCloseNT (node:internal/streams/destroy:122:3) at processTicksAndRejections (node:internal/process/taskqueues:83:21) } warn 9:48:11.724: Kernel Error, context = start [Error]: at new r (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:27924) at new _ (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:2:1918192) at /home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:24:140193 at processTicksAndRejections (node:internal/process/task_queues:96:5)

FetchError: request to http://localhost:8888/api/sessions?1657858691713 failed, reason: getaddrinfo ENOTFOUND localhost at ClientRequest. (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:378131) at ClientRequest.emit (node:events:390:28) at ClientRequest.emit (node:domain:475:12) at Socket.socketErrorListener (node:_http_client:447:9) at Socket.emit (node:events:390:28) at Socket.emit (node:domain:475:12) at emitErrorNT (node:internal/streams/destroy:157:8) at emitErrorCloseNT (node:internal/streams/destroy:122:3) at processTicksAndRejections (node:internal/process/task_queues:83:21) { category: 'unknown', originalException: FetchError: request to http://localhost:8888/api/sessions?1657858691713 failed, reason: getaddrinfo ENOTFOUND localhost at ClientRequest. (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:378131) at ClientRequest.emit (node:events:390:28) at ClientRequest.emit (node:domain:475:12) at Socket.socketErrorListener (node:_http_client:447:9) at Socket.emit (node:events:390:28) at Socket.emit (node:domain:475:12) at emitErrorNT (node:internal/streams/destroy:157:8) at emitErrorCloseNT (node:internal/streams/destroy:122:3) at processTicksAndRejections (node:internal/process/task_queues:83:21) } info 9:48:11.749: Process Execution: > ~/miniconda3/envs/reinforcement-learning/bin/python -c "import ipykernel"

~/miniconda3/envs/reinforcement-learning/bin/python -c "import ipykernel" [W 09:48:11.751 NotebookApp] delete /lunar_lander-jvsc-15f71295-da4a-4342-a214-1fc4dce3d54fa7553eaf-2757-4795-a533-365ad942c266.ipynb info 9:48:11.917: Dispose Kernel '/home/arch/Workspace/lunar_lander/lunar_lander.ipynb' associated with '/home/arch/Workspace/lunar_lander/lunarlander.ipynb' error 9:48:11.918: Error in execution [Error]: at new r (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:27924) at new _ (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:2:1918192) at /home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:24:140193 at processTicksAndRejections (node:internal/process/task_queues:96:5)

FetchError: request to http://localhost:8888/api/sessions?1657858691713 failed, reason: getaddrinfo ENOTFOUND localhost at ClientRequest. (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:378131) at ClientRequest.emit (node:events:390:28) at ClientRequest.emit (node:domain:475:12) at Socket.socketErrorListener (node:_http_client:447:9) at Socket.emit (node:events:390:28) at Socket.emit (node:domain:475:12) at emitErrorNT (node:internal/streams/destroy:157:8) at emitErrorCloseNT (node:internal/streams/destroy:122:3) at processTicksAndRejections (node:internal/process/task_queues:83:21) { category: 'unknown', originalException: FetchError: request to http://localhost:8888/api/sessions?1657858691713 failed, reason: getaddrinfo ENOTFOUND localhost at ClientRequest. (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:378131) at ClientRequest.emit (node:events:390:28) at ClientRequest.emit (node:domain:475:12) at Socket.socketErrorListener (node:_http_client:447:9) at Socket.emit (node:events:390:28) at Socket.emit (node:domain:475:12) at emitErrorNT (node:internal/streams/destroy:157:8) at emitErrorCloseNT (node:internal/streams/destroy:122:3) at processTicksAndRejections (node:internal/process/taskqueues:83:21) } error 9:48:11.919: Error in execution (get message for cell) [Error]: at new r (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:27924) at new _ (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:2:1918192) at /home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:24:140193 at processTicksAndRejections (node:internal/process/task_queues:96:5)

FetchError: request to http://localhost:8888/api/sessions?1657858691713 failed, reason: getaddrinfo ENOTFOUND localhost at ClientRequest. (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:378131) at ClientRequest.emit (node:events:390:28) at ClientRequest.emit (node:domain:475:12) at Socket.socketErrorListener (node:_http_client:447:9) at Socket.emit (node:events:390:28) at Socket.emit (node:domain:475:12) at emitErrorNT (node:internal/streams/destroy:157:8) at emitErrorCloseNT (node:internal/streams/destroy:122:3) at processTicksAndRejections (node:internal/process/task_queues:83:21) { category: 'unknown', originalException: FetchError: request to http://localhost:8888/api/sessions?1657858691713 failed, reason: getaddrinfo ENOTFOUND localhost at ClientRequest. (/home/arch/.vscode/extensions/ms-toolsai.jupyter-2022.6.1101950301/out/extension.node.js:39:378131) at ClientRequest.emit (node:events:390:28) at ClientRequest.emit (node:domain:475:12) at Socket.socketErrorListener (node:_http_client:447:9) at Socket.emit (node:events:390:28) at Socket.emit (node:domain:475:12) at emitErrorNT (node:internal/streams/destroy:157:8) at emitErrorCloseNT (node:internal/streams/destroy:122:3) at processTicksAndRejections (node:internal/process/task_queues:83:21) }



</p>
</details>
amunger commented 2 years ago

It looks like the jupyter server may have crashed so the kernel can't be found to run the cell. Are you able to run any cell, even a simple print(1)?

Can you reload vscode, go through the repro steps again and provide the full logs from that session? If the server crashed, it was probably earlier in the logs, and we would want to see the crash message from that.

n1ghtf4l1 commented 2 years ago

It looks like the jupyter server may have crashed so the kernel can't be found to run the cell. Are you able to run any cell, even a simple print(1)?

No I was not able to even run print(1) then, but thanks I solved the issue after hours of searching the internet on StackOverflow as well as Github. I debugged the issue and found out there was a file called libstc++.so.6 in my conda virtual environment where I was running this program, that didn't have the required version of GLIBCXX_3.4.15.

So, I replaced that file from the one that was created on my global environment i.e in usr/lib/ because I tested the same program by running on the global environment and it worked just fine. Now it works fine even in the virtual environment. Thanks again!

amunger commented 2 years ago

thanks for the update - Glad you got it working

black0017 commented 2 years ago

Hi,

Uninstalling and re-installing all the Jupyter-related plugins worked in my case.

Same error: The Kernel crashed while executing code in the current cell or a previous cell etc...

Best, N.