kwonathan / language-models-trajectory-generators

MIT License
75 stars 6 forks source link

Issues in running the code #2

Open hars-singh opened 5 months ago

hars-singh commented 5 months ago

Hi, I am trying to run the code. But after this output,

[INFO/MainProcess] Finished generating ChatGPT output! [INFO/MainProcess] Capturing head and wrist camera images...

It is not responding.

Thanks, Harsh

kwonathan commented 5 months ago

Hi Harsh,

Thanks for trying out the code! Could you please try terminating the script when it stops responding and paste the traceback info here for me to better understand what might be going on?

Cheers!

hars-singh commented 5 months ago

Hi, Please find the trace back!

It stops exactly at Capturing head and wrist camera images...

pybullet build time: Nov 28 2023 23:52:03 [INFO/MainProcess] Using GPU. /home/harsh/miniconda3/envs/lantree_3/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] final text_encoder_type: bert-base-uncased Model loaded from /home/harsh/.cache/huggingface/hub/models--ShilongLiu--GroundingDINO/snapshots/a94c9b567a2a374598f05c584e96798a170c56fb/groundingdino_swinb_cogcoor.pth => _IncompatibleKeys(missing_keys=[], unexpected_keys=['label_enc.weight', 'bert.embeddings.position_ids']) Hyperparameters read from the model weights: C^k=64, C^v=512, C^h=64 Single object mode: False [INFO/EnvProcess] child process calling self.run() [INFO/EnvProcess] Setting up environment... startThreads creating 1 threads. starting thread 0 started thread 0 argc=2 argv[0] = --unused argv[1] = --start_demo_name=Physics Server ExampleBrowserThreadFunc started X11 functions dynamically loaded using dlopen/dlsym OK! X11 functions dynamically loaded using dlopen/dlsym OK! Creating context Created GL 3.3 context Direct GLX rendering context obtained Making context current GL_VENDOR=NVIDIA Corporation GL_RENDERER=NVIDIA GeForce RTX 4090/PCIe/SSE2 GL_VERSION=3.3.0 NVIDIA 535.161.07 GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler pthread_getconcurrency()=0 Version = 3.3.0 NVIDIA 535.161.07 Vendor = NVIDIA Corporation Renderer = NVIDIA GeForce RTX 4090/PCIe/SSE2 b3Printf: Selected demo: Physics Server startThreads creating 1 threads. starting thread 0 started thread 0 MotionThreadFunc thread started /home/harsh ven = NVIDIA Corporation ven = NVIDIA Corporation [INFO/MainProcess] Finished setting up environment! Enter a command: pick the fruit in the middle [INFO/MainProcess] STARTING TASK... [INFO/MainProcess] Generating ChatGPT output... system: You are a sentient AI that can control a robot arm by generating Python code which outputs a list of trajectory points for the robot arm end-effector to follow to complete a given user command. Each element in the trajectory list is an end-effector pose, and should be of length 4, comprising a 3D position and a rotation value.

AVAILABLE FUNCTIONS: You must remember that this conversation is a monologue, and that you are in control. I am not able to assist you with any questions, and you must output the final code yourself by making use of the available information, common sense, and general knowledge. You are, however, able to call any of the following Python functions, if required, as often as you want:

detect_object(object_or_object_part: str) -> None: This function will not return anything, but only print the position, orientation, and dimensions of any object or object part in the environment. This information will be printed for as many instances of the queried object or object part in the environment. If there are multiple objects or object parts to detect, call one function for each object or object part, all before executing any trajectories. The unit is in metres.
execute_trajectory(trajectory: list) -> None: This function will execute the list of trajectory points on the robot arm end-effector, and will also not return anything.
open_gripper() -> None: This function will open the gripper on the robot arm, and will also not return anything.
close_gripper() -> None: This function will close the gripper on the robot arm, and will also not return anything.
task_completed() -> None: Call this function only when the task has been completed. This function will also not return anything.
When calling any of the functions, make sure to stop generation after each function call and wait for it to be executed, before calling another function and continuing with your plan.

ENVIRONMENT SET-UP: The 3D coordinate system of the environment is as follows:

  1. The x-axis is in the horizontal direction, increasing to the right.
  2. The y-axis is in the depth direction, increasing away from you.
  3. The z-axis is in the vertical direction, increasing upwards. The robot arm end-effector is currently positioned at [0.0, 0.6, 0.55], with the rotation value at 0, and the gripper open. The robot arm is in a top-down set-up, with the end-effector facing down onto a tabletop. The end-effector is therefore able to rotate about the z-axis, from -pi to pi radians. The end-effector gripper has two fingers, and they are currently parallel to the x-axis. The gripper can only grasp objects along sides which are shorter than 0.08. Negative rotation values represent clockwise rotation, and positive rotation values represent anticlockwise rotation. The rotation values should be in radians.

COLLISION AVOIDANCE: If the task requires interaction with multiple objects:

Make sure to consider the object widths, lengths, and heights so that an object does not collide with another object or with the tabletop, unless necessary.
It may help to generate additional trajectories and add specific waypoints (calculated from the given object information) to clear objects and the tabletop and avoid collisions, if necessary.

VELOCITY CONTROL:

The default speed of the robot arm end-effector is 100 points per trajectory.
If you need to make the end-effector follow a particular trajectory more quickly, then generate fewer points for the trajectory, and vice versa.

CODE GENERATION: When generating the code for the trajectory, do the following:

Describe briefly the shape of the motion trajectory required to complete the task.
The trajectory could be broken down into multiple steps. In that case, each trajectory step (at default speed) should contain at least 100 points. Define general functions which can be reused for the different trajectory steps whenever possible, but make sure to define new functions whenever a new motion is required. Output a step-by-step reasoning before generating the code.
If the trajectory is broken down into multiple steps, make sure to chain them such that the start point of trajectory_2 is the same as the end point of trajectory_1 and so on, to ensure a smooth overall trajectory. Call the execute_trajectory function after each trajectory step.
When defining the functions, specify the required parameters, and document them clearly in the code. Make sure to include the orientation parameter.
If you want to print the calculated value of a variable to use later, make sure to use the print function to three decimal places, instead of simply writing the variable name. Do not print any of the trajectory variables, since the output will be too long.
Mark any code clearly with the python and tags.

INITIAL PLANNING 1: If the task requires interaction with an object part (as opposed to the object as a whole), describe which part of the object would be most suitable for the gripper to interact with. Then, detect the necessary objects in the environment. Stop generation after this step to wait until you obtain the printed outputs from the detect_object function calls.

INITIAL PLANNING 2: Then, output Python code to decide which object to interact with, if there are multiple instances of the same object. Then, describe how best to approach the object (for example, approaching the midpoint of the object, or one of its edges, etc.), depending on the nature of the task, or the object dimensions, etc. Then, output a detailed step-by-step plan for the trajectory, including when to lower the gripper to make contact with the object, if necessary. Finally, perform each of these steps one by one. Name each trajectory variable with the trajectory number. Stop generation after each code block to wait for it to finish executing before continuing with your plan.

The user command is "pick the fruit in the middle".

assistant: INITIAL PLANNING 1:

The task requires the robot arm to pick a fruit. The gripper should interact with the fruit as a whole, as it is small enough to be grasped entirely.

First, let's detect the fruit in the environment.

detect_object("fruit")

We will wait for the output of the detect_object function to proceed with the next steps.finish_reason: stop [INFO/MainProcess] Finished generating ChatGPT output! [INFO/MainProcess] Capturing head and wrist camera images...

kwonathan commented 5 months ago

Hi Harsh,

Sorry if I wasn't very clear. I meant something like this, where you should be able to get traceback information after you Ctrl-C when the program stops responding. If you could copy and paste that here, that would be very helpful.

Thanks!

^C[INFO/EnvProcess] process shutting down
Process EnvProcess:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/JohnKwon/Downloads/language-models-trajectory-generators/env.py", line 156, in run_simulation_environment
    env.update()
  File "/Users/JohnKwon/Downloads/language-models-trajectory-generators/env.py", line 34, in update
    time.sleep(config.control_dt)
KeyboardInterrupt
[INFO/EnvProcess] process exiting with exitcode 1
numActiveThreads = 0
stopping threads
Thread with taskId 0 exiting
Thread TERMINATED
destroy semaphore
semaphore destroyed
destroy main semaphore
main semaphore destroyed
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /Users/JohnKwon/Downloads/language-models-trajectory-generators/main.py:114  │
│ in <module>                                                                  │
│                                                                              │
│   111 │   │   │   │   │   │   try:                                           │
│   112 │   │   │   │   │   │   │   f = StringIO()                             │
│   113 │   │   │   │   │   │   │   with redirect_stdout(f):                   │
│ ❱ 114 │   │   │   │   │   │   │   │   exec(code)                             │
│   115 │   │   │   │   │   │   except Exception:                              │
│   116 │   │   │   │   │   │   │   error_message = traceback.format_exc()     │
│   117 │   │   │   │   │   │   │   new_prompt += ERROR_CORRECTION_PROMPT.repl │
│ in <module>:2                                                                │
│                                                                              │
│ /Users/JohnKwon/Downloads/language-models-trajectory-generators/api.py:60 in │
│ detect_object                                                                │
│                                                                              │
│    57 │   │   segmentation_texts = [segmentation_text]                       │
│    58 │   │                                                                  │
│    59 │   │   self.logger.info(PROGRESS + "Segmenting head camera image..."  │
│ ❱  60 │   │   model_predictions, boxes, segmentation_texts = models.get_lang │
│    61 │   │   self.logger.info(OK + "Finished segmenting head camera image!" │
│    62 │   │                                                                  │
│    63 │   │   masks = utils.get_segmentation_mask(model_predictions, config. │
│                                                                              │
│ /Users/JohnKwon/Downloads/language-models-trajectory-generators/models.py:20 │
│ in get_langsam_output                                                        │
│                                                                              │
│    17 │                                                                      │
│    18 │   segmentation_texts = " . ".join(segmentation_texts)                │
│    19 │                                                                      │
│ ❱  20 │   masks, boxes, phrases, logits = model.predict(image, segmentation_ │
│    21 │                                                                      │
│    22 │   _, ax = plt.subplots(1, 1 + len(masks), figsize=(5 + (5 * len(mask │
│    23 │   [a.axis("off") for a in ax.flatten()]                              │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/lang_sam/lang_sam.py:103 in predict                                        │
│                                                                              │
│   100 │   │   return masks.cpu()                                             │
│   101 │                                                                      │
│   102 │   def predict(self, image_pil, text_prompt, box_threshold=0.3, text_ │
│ ❱ 103 │   │   boxes, logits, phrases = self.predict_dino(image_pil, text_pro │
│   104 │   │   masks = torch.tensor([])                                       │
│   105 │   │   if len(boxes) > 0:                                             │
│   106 │   │   │   masks = self.predict_sam(image_pil, boxes)                 │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/lang_sam/lang_sam.py:79 in predict_dino                                    │
│                                                                              │
│    76 │                                                                      │
│    77 │   def predict_dino(self, image_pil, text_prompt, box_threshold, text │
│    78 │   │   image_trans = transform_image(image_pil)                       │
│ ❱  79 │   │   boxes, logits, phrases = predict(model=self.groundingdino,     │
│    80 │   │   │   │   │   │   │   │   │   │    image=image_trans,            │
│    81 │   │   │   │   │   │   │   │   │   │    caption=text_prompt,          │
│    82 │   │   │   │   │   │   │   │   │   │    box_threshold=box_threshold,  │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/groundingdino/util/inference.py:68 in predict                              │
│                                                                              │
│    65 │   image = image.to(device)                                           │
│    66 │                                                                      │
│    67 │   with torch.no_grad():                                              │
│ ❱  68 │   │   outputs = model(image[None], captions=[caption])               │
│    69 │                                                                      │
│    70 │   prediction_logits = outputs["pred_logits"].cpu().sigmoid()[0]  # p │
│    71 │   prediction_boxes = outputs["pred_boxes"].cpu()[0]  # prediction_bo │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/module.py:1518 in _wrapped_call_impl                      │
│                                                                              │
│   1515 │   │   if self._compiled_call_impl is not None:                      │
│   1516 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: │
│   1517 │   │   else:                                                         │
│ ❱ 1518 │   │   │   return self._call_impl(*args, **kwargs)                   │
│   1519 │                                                                     │
│   1520 │   def _call_impl(self, *args, **kwargs):                            │
│   1521 │   │   forward_call = (self._slow_forward if torch._C._get_tracing_s │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/module.py:1527 in _call_impl                              │
│                                                                              │
│   1524 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1525 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1526 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1527 │   │   │   return forward_call(*args, **kwargs)                      │
│   1528 │   │                                                                 │
│   1529 │   │   try:                                                          │
│   1530 │   │   │   result = None                                             │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/groundingdino/models/GroundingDINO/groundingdino.py:288 in forward         │
│                                                                              │
│   285 │   │                                                                  │
│   286 │   │   if isinstance(samples, (list, torch.Tensor)):                  │
│   287 │   │   │   samples = nested_tensor_from_tensor_list(samples)          │
│ ❱ 288 │   │   features, poss = self.backbone(samples)                        │
│   289 │   │                                                                  │
│   290 │   │   srcs = []                                                      │
│   291 │   │   masks = []                                                     │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/module.py:1518 in _wrapped_call_impl                      │
│                                                                              │
│   1515 │   │   if self._compiled_call_impl is not None:                      │
│   1516 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: │
│   1517 │   │   else:                                                         │
│ ❱ 1518 │   │   │   return self._call_impl(*args, **kwargs)                   │
│   1519 │                                                                     │
│   1520 │   def _call_impl(self, *args, **kwargs):                            │
│   1521 │   │   forward_call = (self._slow_forward if torch._C._get_tracing_s │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/module.py:1527 in _call_impl                              │
│                                                                              │
│   1524 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1525 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1526 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1527 │   │   │   return forward_call(*args, **kwargs)                      │
│   1528 │   │                                                                 │
│   1529 │   │   try:                                                          │
│   1530 │   │   │   result = None                                             │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/groundingdino/models/GroundingDINO/backbone/backbone.py:151 in forward     │
│                                                                              │
│   148 │   │   super().__init__(backbone, position_embedding)                 │
│   149 │                                                                      │
│   150 │   def forward(self, tensor_list: NestedTensor):                      │
│ ❱ 151 │   │   xs = self[0](tensor_list)                                      │
│   152 │   │   out: List[NestedTensor] = []                                   │
│   153 │   │   pos = []                                                       │
│   154 │   │   for name, x in xs.items():                                     │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/module.py:1518 in _wrapped_call_impl                      │
│                                                                              │
│   1515 │   │   if self._compiled_call_impl is not None:                      │
│   1516 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: │
│   1517 │   │   else:                                                         │
│ ❱ 1518 │   │   │   return self._call_impl(*args, **kwargs)                   │
│   1519 │                                                                     │
│   1520 │   def _call_impl(self, *args, **kwargs):                            │
│   1521 │   │   forward_call = (self._slow_forward if torch._C._get_tracing_s │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/module.py:1527 in _call_impl                              │
│                                                                              │
│   1524 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1525 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1526 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1527 │   │   │   return forward_call(*args, **kwargs)                      │
│   1528 │   │                                                                 │
│   1529 │   │   try:                                                          │
│   1530 │   │   │   result = None                                             │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/groundingdino/models/GroundingDINO/backbone/swin_transformer.py:732 in     │
│ forward                                                                      │
│                                                                              │
│   729 │   │   outs = []                                                      │
│   730 │   │   for i in range(self.num_layers):                               │
│   731 │   │   │   layer = self.layers[i]                                     │
│ ❱ 732 │   │   │   x_out, H, W, x, Wh, Ww = layer(x, Wh, Ww)                  │
│   733 │   │   │                                                              │
│   734 │   │   │   if i in self.out_indices:                                  │
│   735 │   │   │   │   norm_layer = getattr(self, f"norm{i}")                 │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/module.py:1518 in _wrapped_call_impl                      │
│                                                                              │
│   1515 │   │   if self._compiled_call_impl is not None:                      │
│   1516 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: │
│   1517 │   │   else:                                                         │
│ ❱ 1518 │   │   │   return self._call_impl(*args, **kwargs)                   │
│   1519 │                                                                     │
│   1520 │   def _call_impl(self, *args, **kwargs):                            │
│   1521 │   │   forward_call = (self._slow_forward if torch._C._get_tracing_s │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/module.py:1527 in _call_impl                              │
│                                                                              │
│   1524 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1525 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1526 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1527 │   │   │   return forward_call(*args, **kwargs)                      │
│   1528 │   │                                                                 │
│   1529 │   │   try:                                                          │
│   1530 │   │   │   result = None                                             │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/groundingdino/models/GroundingDINO/backbone/swin_transformer.py:448 in     │
│ forward                                                                      │
│                                                                              │
│   445 │   │   for blk in self.blocks:                                        │
│   446 │   │   │   blk.H, blk.W = H, W                                        │
│   447 │   │   │   if self.use_checkpoint:                                    │
│ ❱ 448 │   │   │   │   x = checkpoint.checkpoint(blk, x, attn_mask)           │
│   449 │   │   │   else:                                                      │
│   450 │   │   │   │   x = blk(x, attn_mask)                                  │
│   451 │   │   if self.downsample is not None:                                │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/_compile.py:24 in inner                                              │
│                                                                              │
│   21 │   │   def inner(*args, **kwargs):                                     │
│   22 │   │   │   import torch._dynamo                                        │
│   23 │   │   │                                                               │
│ ❱ 24 │   │   │   return torch._dynamo.disable(fn, recursive)(*args, **kwargs │
│   25 │   │                                                                   │
│   26 │   │   return inner                                                    │
│   27 │   else:                                                               │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/_dynamo/eval_frame.py:328 in _fn                                     │
│                                                                              │
│    325 │   │   │   dynamic_ctx = enable_dynamic(self.dynamic, self.export)   │
│    326 │   │   │   dynamic_ctx.__enter__()                                   │
│    327 │   │   │   try:                                                      │
│ ❱  328 │   │   │   │   return fn(*args, **kwargs)                            │
│    329 │   │   │   finally:                                                  │
│    330 │   │   │   │   set_eval_frame(prior)                                 │
│    331 │   │   │   │   dynamic_ctx.__exit__(None, None, None)                │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/_dynamo/external_utils.py:17 in inner                                │
│                                                                              │
│   14 │                                                                       │
│   15 │   @functools.wraps(fn)                                                │
│   16 │   def inner(*args, **kwargs):                                         │
│ ❱ 17 │   │   return fn(*args, **kwargs)                                      │
│   18 │                                                                       │
│   19 │   return inner                                                        │
│   20                                                                         │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/utils/checkpoint.py:451 in checkpoint                                │
│                                                                              │
│    448 │   │   │   │   "Passing `context_fn` or `debug` is only supported wh │
│    449 │   │   │   │   "use_reentrant=False."                                │
│    450 │   │   │   )                                                         │
│ ❱  451 │   │   return CheckpointFunction.apply(function, preserve, *args)    │
│    452 │   else:                                                             │
│    453 │   │   gen = _checkpoint_without_reentrant_generator(                │
│    454 │   │   │   function, preserve, context_fn, determinism_check, debug, │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/autograd/function.py:539 in apply                                    │
│                                                                              │
│   536 │   │   if not torch._C._are_functorch_transforms_active():            │
│   537 │   │   │   # See NOTE: [functorch vjp and autograd interaction]       │
│   538 │   │   │   args = _functorch.utils.unwrap_dead_wrappers(args)         │
│ ❱ 539 │   │   │   return super().apply(*args, **kwargs)  # type: ignore[misc │
│   540 │   │                                                                  │
│   541 │   │   if cls.setup_context == _SingleLevelFunction.setup_context:    │
│   542 │   │   │   raise RuntimeError(                                        │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/utils/checkpoint.py:230 in forward                                   │
│                                                                              │
│    227 │   │   ctx.save_for_backward(*tensor_inputs)                         │
│    228 │   │                                                                 │
│    229 │   │   with torch.no_grad():                                         │
│ ❱  230 │   │   │   outputs = run_function(*args)                             │
│    231 │   │   return outputs                                                │
│    232 │                                                                     │
│    233 │   @staticmethod                                                     │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/module.py:1518 in _wrapped_call_impl                      │
│                                                                              │
│   1515 │   │   if self._compiled_call_impl is not None:                      │
│   1516 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: │
│   1517 │   │   else:                                                         │
│ ❱ 1518 │   │   │   return self._call_impl(*args, **kwargs)                   │
│   1519 │                                                                     │
│   1520 │   def _call_impl(self, *args, **kwargs):                            │
│   1521 │   │   forward_call = (self._slow_forward if torch._C._get_tracing_s │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/module.py:1527 in _call_impl                              │
│                                                                              │
│   1524 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1525 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1526 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1527 │   │   │   return forward_call(*args, **kwargs)                      │
│   1528 │   │                                                                 │
│   1529 │   │   try:                                                          │
│   1530 │   │   │   result = None                                             │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/groundingdino/models/GroundingDINO/backbone/swin_transformer.py:296 in     │
│ forward                                                                      │
│                                                                              │
│   293 │   │                                                                  │
│   294 │   │   # FFN                                                          │
│   295 │   │   x = shortcut + self.drop_path(x)                               │
│ ❱ 296 │   │   x = x + self.drop_path(self.mlp(self.norm2(x)))                │
│   297 │   │                                                                  │
│   298 │   │   return x                                                       │
│   299                                                                        │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/module.py:1518 in _wrapped_call_impl                      │
│                                                                              │
│   1515 │   │   if self._compiled_call_impl is not None:                      │
│   1516 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: │
│   1517 │   │   else:                                                         │
│ ❱ 1518 │   │   │   return self._call_impl(*args, **kwargs)                   │
│   1519 │                                                                     │
│   1520 │   def _call_impl(self, *args, **kwargs):                            │
│   1521 │   │   forward_call = (self._slow_forward if torch._C._get_tracing_s │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/module.py:1527 in _call_impl                              │
│                                                                              │
│   1524 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1525 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1526 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1527 │   │   │   return forward_call(*args, **kwargs)                      │
│   1528 │   │                                                                 │
│   1529 │   │   try:                                                          │
│   1530 │   │   │   result = None                                             │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/groundingdino/models/GroundingDINO/backbone/swin_transformer.py:39 in      │
│ forward                                                                      │
│                                                                              │
│    36 │   │   self.drop = nn.Dropout(drop)                                   │
│    37 │                                                                      │
│    38 │   def forward(self, x):                                              │
│ ❱  39 │   │   x = self.fc1(x)                                                │
│    40 │   │   x = self.act(x)                                                │
│    41 │   │   x = self.drop(x)                                               │
│    42 │   │   x = self.fc2(x)                                                │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/module.py:1518 in _wrapped_call_impl                      │
│                                                                              │
│   1515 │   │   if self._compiled_call_impl is not None:                      │
│   1516 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: │
│   1517 │   │   else:                                                         │
│ ❱ 1518 │   │   │   return self._call_impl(*args, **kwargs)                   │
│   1519 │                                                                     │
│   1520 │   def _call_impl(self, *args, **kwargs):                            │
│   1521 │   │   forward_call = (self._slow_forward if torch._C._get_tracing_s │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/module.py:1527 in _call_impl                              │
│                                                                              │
│   1524 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1525 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1526 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1527 │   │   │   return forward_call(*args, **kwargs)                      │
│   1528 │   │                                                                 │
│   1529 │   │   try:                                                          │
│   1530 │   │   │   result = None                                             │
│                                                                              │
│ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package │
│ s/torch/nn/modules/linear.py:114 in forward                                  │
│                                                                              │
│   111 │   │   │   init.uniform_(self.bias, -bound, bound)                    │
│   112 │                                                                      │
│   113 │   def forward(self, input: Tensor) -> Tensor:                        │
│ ❱ 114 │   │   return F.linear(input, self.weight, self.bias)                 │
│   115 │                                                                      │
│   116 │   def extra_repr(self) -> str:                                       │
│   117 │   │   return f'in_features={self.in_features}, out_features={self.ou │
╰──────────────────────────────────────────────────────────────────────────────╯
KeyboardInterrupt
[INFO/MainProcess] process shutting down
hars-singh commented 5 months ago

Please find the traceback:

[INFO/MainProcess] Finished generating ChatGPT output! [INFO/MainProcess] Capturing head and wrist camera images... ^CTraceback (most recent call last): File "/home/harsh/Music/language-models-trajectory-generators/main.py", line 117, in exec(code) File "", line 2, in File "/home/harsh/Music/language-models-trajectory-generators/api.py", line 41, in detect_object [head_camera_position, head_camera_orientation_q, wrist_camera_position, wrist_camera_orientation_q, env_connection_message] = self.main_connection.recv() File "/home/harsh/miniconda3/envs/lantree_3/lib/python3.9/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/home/harsh/miniconda3/envs/lantree_3/lib/python3.9/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/home/harsh/miniconda3/envs/lantree_3/lib/python3.9/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) KeyboardInterrupt [INFO/MainProcess] process shutting down [INFO/MainProcess] calling join() for process EnvProcess ^CError in atexit._run_exitfuncs: Traceback (most recent call last): File "/home/harsh/miniconda3/envs/lantree_3/lib/python3.9/multiprocessing/popen_fork.py", line 27, in poll pid, sts = os.waitpid(self.pid, flag) KeyboardInterrupt

hars-singh commented 5 months ago

Hi, Is there any solution for this? I am trying to debug but have no Idea what should I change.

kwonathan commented 5 months ago

Hi Harsh,

Thanks for the traceback information. I've had a look into it, and it looks like the save_image function from torchvision.utils in robot.py is what's causing the problem. I'm not entirely sure why it's not responding after it gets called, and it might be something to do with multiprocessing, but in the meantime, I suggest using Image.save() instead, as in here. I will update the code with this fix tomorrow after some more testing, and having looked into other files where the save_image function is called.

Hope this helps for now!

hars-singh commented 5 months ago

Hi,

I have replaced save_image to the following,

depth_normalized = (depth_array - np.min(depth_array)) / (np.max(depth_array) - np.min(depth_array)) depth_image_8bit = np.uint8(depth_normalized * 255) depth_image = Image.fromarray(depth_image_8bit, mode='L') depth_image.save(depth_image_path)

It seems working but I can not see any task getting completed in renderer. But in the output, I always find task successful. I tested on push can right.

kwonathan commented 5 months ago

Hi Harsh,

Thanks for letting me know. Could you please describe what exactly is showing on the renderer and what isn't? Is the robot and the can being loaded before you enter the task command? Are the trajectory points (cyan) and bounding boxes (green) being added to the environment? Or is all of that being loaded, but just the robot is not moving?

It might also help to add a screenshot of the renderer for me to have a look.

Thanks!

hars-singh commented 5 months ago

ezgif-3-249302df8b Hi, here is what's happening in the environment. Task is "push can to right". It is not able to perform the task. I tried with other tasks too, but not any task is working. But in the terminal output, after 4-5 steps, It shows task completed.

kwonathan commented 5 months ago

Thanks for this, and could you please also copy and paste the full outputs here? It might be that the bounding box of the can isn't being calculated correctly and therefore the LLM is generating trajectories which the robot cannot reach, or the bounding box is being calculated correctly, but the LLM itself is generating wrong trajectories (which can happen) which, again, the robot cannot reach.

Thanks again for bringing these issues up and for your patience!

hars-singh commented 5 months ago

Thanks, This one for task, "shake mustard bottle", where there is also same issue. Please find the output. pybullet build time: Nov 28 2023 23:52:03 /home/harsh/miniconda3/envs/lantree_3/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] final text_encoder_type: bert-base-uncased Model loaded from /home/harsh/.cache/huggingface/hub/models--ShilongLiu--GroundingDINO/snapshots/a94c9b567a2a374598f05c584e96798a170c56fb/groundingdino_swinb_cogcoor.pth => _IncompatibleKeys(missing_keys=[], unexpected_keys=['label_enc.weight', 'bert.embeddings.position_ids']) Hyperparameters read from the model weights: C^k=64, C^v=512, C^h=64 Single object mode: False [INFO/EnvProcess] child process calling self.run() startThreads creating 1 threads. starting thread 0 started thread 0 argc=2 argv[0] = --unused argv[1] = --start_demo_name=Physics Server ExampleBrowserThreadFunc started X11 functions dynamically loaded using dlopen/dlsym OK! X11 functions dynamically loaded using dlopen/dlsym OK! Creating context Created GL 3.3 context Direct GLX rendering context obtained Making context current GL_VENDOR=NVIDIA Corporation GL_RENDERER=NVIDIA GeForce RTX 4090/PCIe/SSE2 GL_VERSION=3.3.0 NVIDIA 535.161.07 GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler pthread_getconcurrency()=0 Version = 3.3.0 NVIDIA 535.161.07 Vendor = NVIDIA Corporation Renderer = NVIDIA GeForce RTX 4090/PCIe/SSE2 b3Printf: Selected demo: Physics Server startThreads creating 1 threads. starting thread 0 started thread 0 MotionThreadFunc thread started /home/harsh/Music/language-models-trajectory-generators ven = NVIDIA Corporation ven = NVIDIA Corporation Enter a command: shake mustard bottle system: You are a sentient AI that can control a robot arm by generating Python code which outputs a list of trajectory points for the robot arm end-effector to follow to complete a given user command. Each element in the trajectory list is an end-effector pose, and should be of length 4, comprising a 3D position and a rotation value.

AVAILABLE FUNCTIONS: You must remember that this conversation is a monologue, and that you are in control. I am not able to assist you with any questions, and you must output the final code yourself by making use of the available information, common sense, and general knowledge. You are, however, able to call any of the following Python functions, if required, as often as you want:

  1. detect_object(object_or_object_part: str) -> None: This function will not return anything, but only print the position, orientation, and dimensions of any object or object part in the environment. This information will be printed for as many instances of the queried object or object part in the environment. If there are multiple objects or object parts to detect, call one function for each object or object part, all before executing any trajectories. The unit is in metres.
  2. execute_trajectory(trajectory: list) -> None: This function will execute the list of trajectory points on the robot arm end-effector, and will also not return anything.
  3. open_gripper() -> None: This function will open the gripper on the robot arm, and will also not return anything.
  4. close_gripper() -> None: This function will close the gripper on the robot arm, and will also not return anything.
  5. task_completed() -> None: Call this function only when the task has been completed. This function will also not return anything. When calling any of the functions, make sure to stop generation after each function call and wait for it to be executed, before calling another function and continuing with your plan.

ENVIRONMENT SET-UP: The 3D coordinate system of the environment is as follows:

  1. The x-axis is in the horizontal direction, increasing to the right.
  2. The y-axis is in the depth direction, increasing away from you.
  3. The z-axis is in the vertical direction, increasing upwards. The robot arm end-effector is currently positioned at [0.0, 0.6, 0.55], with the rotation value at 0, and the gripper open. The robot arm is in a top-down set-up, with the end-effector facing down onto a tabletop. The end-effector is therefore able to rotate about the z-axis, from -pi to pi radians. The end-effector gripper has two fingers, and they are currently parallel to the x-axis. The gripper can only grasp objects along sides which are shorter than 0.08. Negative rotation values represent clockwise rotation, and positive rotation values represent anticlockwise rotation. The rotation values should be in radians.

COLLISION AVOIDANCE: If the task requires interaction with multiple objects:

  1. Make sure to consider the object widths, lengths, and heights so that an object does not collide with another object or with the tabletop, unless necessary.
  2. It may help to generate additional trajectories and add specific waypoints (calculated from the given object information) to clear objects and the tabletop and avoid collisions, if necessary.

VELOCITY CONTROL:

  1. The default speed of the robot arm end-effector is 100 points per trajectory.
  2. If you need to make the end-effector follow a particular trajectory more quickly, then generate fewer points for the trajectory, and vice versa.

CODE GENERATION: When generating the code for the trajectory, do the following:

  1. Describe briefly the shape of the motion trajectory required to complete the task.
  2. The trajectory could be broken down into multiple steps. In that case, each trajectory step (at default speed) should contain at least 100 points. Define general functions which can be reused for the different trajectory steps whenever possible, but make sure to define new functions whenever a new motion is required. Output a step-by-step reasoning before generating the code.
  3. If the trajectory is broken down into multiple steps, make sure to chain them such that the start point of trajectory_2 is the same as the end point of trajectory_1 and so on, to ensure a smooth overall trajectory. Call the execute_trajectory function after each trajectory step.
  4. When defining the functions, specify the required parameters, and document them clearly in the code. Make sure to include the orientation parameter.
  5. If you want to print the calculated value of a variable to use later, make sure to use the print function to three decimal places, instead of simply writing the variable name. Do not print any of the trajectory variables, since the output will be too long.
  6. Mark any code clearly with the python and tags.

INITIAL PLANNING 1: If the task requires interaction with an object part (as opposed to the object as a whole), describe which part of the object would be most suitable for the gripper to interact with. Then, detect the necessary objects in the environment. Stop generation after this step to wait until you obtain the printed outputs from the detect_object function calls.

INITIAL PLANNING 2: Then, output Python code to decide which object to interact with, if there are multiple instances of the same object. Then, describe how best to approach the object (for example, approaching the midpoint of the object, or one of its edges, etc.), depending on the nature of the task, or the object dimensions, etc. Then, output a detailed step-by-step plan for the trajectory, including when to lower the gripper to make contact with the object, if necessary. Finally, perform each of these steps one by one. Name each trajectory variable with the trajectory number. Stop generation after each code block to wait for it to finish executing before continuing with your plan.

The user command is "shake mustard bottle".

assistant: INITIAL PLANNING 1:

To shake a mustard bottle, the robot arm needs to grip the bottle by its body, which is usually the widest part of the bottle. This is because the gripper needs to have a firm hold on the bottle to be able to shake it effectively without dropping it.

Let's start by detecting the mustard bottle in the environment.

detect_object("mustard bottle")

finish_reason: stop /home/harsh/miniconda3/envs/lantree_3/lib/python3.9/site-packages/transformers/modeling_utils.py:907: FutureWarning: The device argument is deprecated and will be removed in v5 of Transformers. warnings.warn( /home/harsh/miniconda3/envs/lantree_3/lib/python3.9/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /home/harsh/miniconda3/envs/lantree_3/lib/python3.9/site-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( user: Print statement output: Position of mustard bottle: [0.0, 1.198, 0.57] Dimensions: Width: 0.0 Length: 0.003 Height: 0.003 Orientation along shorter side (width): 0.029 Orientation along longer side (length): -1.542

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

Since there is only one mustard bottle in the environment, we don't need to decide which object to interact with.

The best way to approach the mustard bottle is from above, aligning the gripper with the shorter side (width) of the bottle. This is because the gripper can only grasp objects along sides which are shorter than 0.08, and the width of the bottle is 0.0, which is within this limit.

The detailed step-by-step plan for the trajectory is as follows:

  1. Move the end-effector to a point above the mustard bottle, aligning the gripper with the shorter side of the bottle. This will be the start point of the trajectory.
  2. Lower the end-effector to make contact with the mustard bottle.
  3. Close the gripper to grasp the mustard bottle.
  4. Lift the mustard bottle off the tabletop.
  5. Shake the mustard bottle by moving the end-effector up and down a few times.
  6. Lower the mustard bottle back onto the tabletop.
  7. Open the gripper to release the mustard bottle.
  8. Move the end-effector back to its initial position.

Let's start by moving the end-effector to a point above the mustard bottle.

# Define the start and end points of the trajectory
start_point = [0.0, 0.6, 0.55, 0]
end_point = [0.0, 1.198, 0.57 + 0.1, 0.029]  # Add 0.1 to the z-coordinate to approach from above

# Generate the trajectory
trajectory_1 = []
for i in range(100):
    point = [start_point[j] + i/100 * (end_point[j] - start_point[j]) for j in range(4)]
    trajectory_1.append(point)

# Execute the trajectory
execute_trajectory(trajectory_1)

This code will move the end-effector from its current position to a point 0.1 metres above the mustard bottle, aligning the gripper with the shorter side of the bottle. The trajectory is a straight line in the yz-plane. The rotation value changes from 0 to 0.029 to align the gripper with the shorter side of the bottle.finish_reason: stop user:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

# Define the start and end points of the trajectory
start_point = [0.0, 1.198, 0.67, 0.029]
end_point = [0.0, 1.198, 0.57, 0.029]  # Lower the end-effector by 0.1

# Generate the trajectory
trajectory_2 = []
for i in range(100):
    point = [start_point[j] + i/100 * (end_point[j] - start_point[j]) for j in range(4)]
    trajectory_2.append(point)

# Execute the trajectory
execute_trajectory(trajectory_2)

This code will lower the end-effector by 0.1 metres to make contact with the mustard bottle. The trajectory is a straight line in the z-direction. The rotation value remains the same to keep the gripper aligned with the shorter side of the bottle.finish_reason: stop user:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

close_gripper()

This code will close the gripper, allowing it to grasp the mustard bottle.finish_reason: stop user:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

# Define the start and end points of the trajectory
start_point = [0.0, 1.198, 0.57, 0.029]
end_point = [0.0, 1.198, 0.67, 0.029]  # Lift the end-effector by 0.1

# Generate the trajectory
trajectory_3 = []
for i in range(100):
    point = [start_point[j] + i/100 * (end_point[j] - start_point[j]) for j in range(4)]
    trajectory_3.append(point)

# Execute the trajectory
execute_trajectory(trajectory_3)

This code will lift the end-effector by 0.1 metres, taking the mustard bottle with it. The trajectory is a straight line in the z-direction. The rotation value remains the same to keep the gripper aligned with the shorter side of the bottle.finish_reason: stop user:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

# Define the start and end points of the trajectory
start_point = [0.0, 1.198, 0.67, 0.029]
end_point = [0.0, 1.198, 0.57, 0.029]  # Lower the end-effector by 0.1

# Generate the trajectory
trajectory_4 = []
for i in range(100):
    point = [start_point[j] + i/100 * (end_point[j] - start_point[j]) for j in range(4)]
    trajectory_4.append(point)

# Execute the trajectory
execute_trajectory(trajectory_4)

# Define the start and end points of the trajectory
start_point = [0.0, 1.198, 0.57, 0.029]
end_point = [0.0, 1.198, 0.67, 0.029]  # Lift the end-effector by 0.1

# Generate the trajectory
trajectory_5 = []
for i in range(100):
    point = [start_point[j] + i/100 * (end_point[j] - start_point[j]) for j in range(4)]
    trajectory_5.append(point)

# Execute the trajectory
execute_trajectory(trajectory_5)

This code will move the end-effector up and down by 0.1 metres, shaking the mustard bottle. The trajectories are straight lines in the z-direction. The rotation value remains the same to keep the gripper aligned with the shorter side of the bottle.finish_reason: stop user:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

# Define the start and end points of the trajectory
start_point = [0.0, 1.198, 0.67, 0.029]
end_point = [0.0, 1.198, 0.57, 0.029]  # Lower the end-effector by 0.1

# Generate the trajectory
trajectory_6 = []
for i in range(100):
    point = [start_point[j] + i/100 * (end_point[j] - start_point[j]) for j in range(4)]
    trajectory_6.append(point)

# Execute the trajectory
execute_trajectory(trajectory_6)

This code will lower the end-effector by 0.1 metres, placing the mustard bottle back onto the tabletop. The trajectory is a straight line in the z-direction. The rotation value remains the same to keep the gripper aligned with the shorter side of the bottle.finish_reason: stop user:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

open_gripper()

This code will open the gripper, releasing the mustard bottle.finish_reason: stop user:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

# Define the start and end points of the trajectory
start_point = [0.0, 1.198, 0.57, 0.029]
end_point = [0.0, 0.6, 0.55, 0]  # Initial position of the end-effector

# Generate the trajectory
trajectory_7 = []
for i in range(100):
    point = [start_point[j] + i/100 * (end_point[j] - start_point[j]) for j in range(4)]
    trajectory_7.append(point)

# Execute the trajectory
execute_trajectory(trajectory_7)

This code will move the end-effector from its current position back to its initial position. The trajectory is a straight line in the yz-plane. The rotation value changes from 0.029 to 0 to align the gripper with the x-axis.

Now that the task has been completed, let's call the task_completed function.


task_completed()
```finish_reason: stop
kwonathan commented 5 months ago

Hi Harsh,

Thanks for this. So it does look like the bounding box isn't being calculated correctly:

Print statement output:
Position of mustard bottle: [0.0, 1.198, 0.57]
Dimensions:
Width: 0.0
Length: 0.003
Height: 0.003
Orientation along shorter side (width): 0.029
Orientation along longer side (length): -1.542

where the dimensions are near zero and the position is 1.198 away from the robot along the y-axis (for reference, the object positions are by default constrained to between 0.4 and 0.8 along the y-axis: see here), which is too far for the robot to reach.

I've pushed a fix to the robot.py file which should correctly save the depth image required to calculate the bounding box, so this issue should be fixed now, and the bounding box around the object should be correctly shown in green. Let me know if this is still not working.

Also note that the LLM does not know if the task has been completed successfully or not after generating the full trajectory, and this is checked when it calls task_completed(), where another instance of the LLM will determine whether the task was successful or not by looking at the poses of the relevant objects throughout the task execution. If failed, then the system will re-plan an alternative trajectory. Please let me know if there are any issues with this as well.

Thanks!

kwonathan commented 4 months ago

Hi, just following up to check if the pushed code fixed the issue and you were able to get the system running fully? Thanks!

hars-singh commented 4 months ago

Hi, It is perfectly working now. Thanks, Harsh