tpark94 / speed-ue-cube-baseline

MATLAB Deep Learning Toolbox implementation of CNN used in baseline studies of SPEED-UE-Cube dataset
Creative Commons Attribution 4.0 International
8 stars 0 forks source link

Satellite Pose Estimation Dataset (Guidance) #1

Open dsriaditya999 opened 1 month ago

dsriaditya999 commented 1 month ago

My research group was greatly inspired by your paper and your comprehensive approach to generating synthetic datasets for spacecraft pose estimation using Unreal Engine. We are currently working on a similar project to create a satellite pose estimation dataset within Unreal Engine.

Project Background

We have created a scene in Unreal Engine that includes Earth, a Satellite, the Sun, and Camera Actors. Our goal is to generate a dataset comprising images and corresponding satellite poses. Additionally, we aim to obtain the pixel coordinates (raster space) of specific keypoints on the satellite for each image.

Our Approach

Note: We are using Unreal Editor Mode and not Game play Mode

For getting images and recording poses along with 3D points, we use the following (sample) code:

import unreal
import os
import json

#To use this script, you need to have a scene set up in Unreal Engine with a camera actor and an object (e.g., a static mesh actor) whose keypoints (sockets) you want to track. Make the camera as pilot and position it manually to capture images of the object from different angles. The script captures high-resolution screenshots and the pose data of the camera and the object's keypoints. The data is saved in a JSON file for further processing.

class Capture:
    """
    Class to capture high-resolution screenshots and pose data of a camera and keypoints on an object in Unreal Engine.
    """
    def __init__(self, output_folder, camera_actor, satellite_actor):
        """
        Initializes the Capture class with specified parameters.

        Args:
        output_folder (str): Path to the directory where screenshots and JSON data will be stored.
        camera_actor (unreal.CameraActor): The camera actor used to take screenshots.
        satellite_actor (unreal.Actor): The actor (object) whose keypoints (sockets) are to be tracked.
        """
        self.output_folder = output_folder
        self.camera_actor = camera_actor
        self.satellite_actor = satellite_actor
        self.data = []

        # Ensure the output directory exists, create if it does not.
        if not os.path.exists(self.output_folder):
            os.makedirs(self.output_folder)

        self.json_path = os.path.join(self.output_folder, 'image_data.json')
        self.load_data()

    def load_data(self):
        """
        Loads existing data from the JSON file if available, otherwise starts with an empty list.
        """
        try:
            if os.path.exists(self.json_path):
                with open(self.json_path, 'r') as json_file:
                    self.data = json.load(json_file)
        except json.JSONDecodeError as e:
            print(f"Error reading JSON file: {e}. Creating a new empty file.")
            self.data = []

    def capture_image_and_data(self, image_name):
        """
        Captures a screenshot and the current pose data, then saves it.

        Args:
        image_name (str): Base name for the screenshot file and associated data.
        """
        try:
            screenshot_path = self.render_image(image_name)
            poses = self.get_camera_and_socket_pos(image_name)
            self.save_image_data(poses)
            print(f"Screenshot {image_name} taken and data saved at {screenshot_path}.")
        except Exception as e:
            print(f"Exception occurred: {e}")

    def render_image(self, image_name):
        """
        Takes a high-resolution screenshot using Unreal's automation library.

        Args:
        image_name (str): Name of the image file to save.

        Returns:
        str: Full path to the saved image file.
        """
        screenshot_path = os.path.join(self.output_folder, f"{image_name}.png")
        unreal.AutomationLibrary.take_high_res_screenshot(1920, 1080, screenshot_path)
        return screenshot_path

    def save_image_data(self, image_data):
        """
        Appends new image data to the list and writes it to the JSON file.

        Args:
        image_data (dict): Data dictionary containing pose information to be saved.
        """
        self.data.append(image_data)
        try:
            with open(self.json_path, 'w') as json_file:
                json.dump(self.data, json_file, indent=4)
        except Exception as e:
            print(f"Failed to write JSON data: {e}")

    def get_camera_and_socket_pos(self, image_name):
        """
        Retrieves the camera and socket positions in world coordinates.

        Args:
        image_name (str): Name of the image to associate with the data.

        Returns:
        dict: Dictionary containing camera and socket positions.
        """
        cam_world_transform = self.camera_actor.get_actor_transform()
        world_camera_position = cam_world_transform.translation
        world_camera_rotation = cam_world_transform.rotation

        sockets_data = []
        static_mesh_component = self.satellite_actor.get_component_by_class(unreal.StaticMeshComponent)
        if static_mesh_component:
            for socket_name in static_mesh_component.get_all_socket_names():
                socket_transform = static_mesh_component.get_socket_transform(socket_name, unreal.RelativeTransformSpace.RTS_WORLD)
                sockets_data.append({
                    "socket_name": str(socket_name),
                    "position": {
                        "x": socket_transform.translation.x,
                        "y": socket_transform.translation.y,
                        "z": socket_transform.translation.z
                    }
                })
        return {
            "image": image_name,
            "camera_position": {"x": world_camera_position.x, "y": world_camera_position.y, "z": world_camera_position.z},
            "camera_rotation": {"x": world_camera_rotation.x, "y": world_camera_rotation.y, "z": world_camera_rotation.z, "w": world_camera_rotation.w},
            "sockets": sockets_data
        }

# Helper function to find an actor by class and name
def get_actor_by_class_and_name(actor_class, actor_name=None):
    """
    Retrieves an actor of a specified class and name from the Unreal Engine world.

    Args:
    actor_class (unreal.Class): The class of the actor to find.
    actor_name (str, optional): The name of the actor to find. If not provided, returns the first found actor.

    Returns:
    unreal.Actor: The first actor found matching the criteria or None if no match is found.
    """
    world = unreal.EditorLevelLibrary.get_editor_world()
    actors = unreal.GameplayStatics.get_all_actors_of_class(world, actor_class)
    for actor in actors:
        if actor.get_actor_label() == actor_name:
            return actor
    return None

# Main script usage example
output_folder = "Path/to/your/output/folder"
camera_actor = unreal.GameplayStatics.get_actor_of_class(unreal.EditorLevelLibrary.get_editor_world(), unreal.CameraActor)
box_actor = get_actor_by_class_and_name(unreal.StaticMeshActor, "box")

if not camera_actor or not box_actor:
    raise RuntimeError("Camera or satellite actor not found")

capture_instance = Capture(output_folder, camera_actor, box_actor)
capture_instance.capture_image_and_data(image_name='Image_1')
def get_actor_by_class_and_name(actor_class, actor_name=None):
    """
    Retrieves an actor of a specified class and name from the Unreal Engine world.

    Args:
    actor_class (unreal.Class): The class of the actor to find.
    actor_name (str, optional): The name of the actor to find. If not provided, returns the first found actor.

    Returns:
    unreal.Actor: The first actor found matching the criteria or None if no match is found.
    """
    world = unreal.EditorLevelLibrary.get_editor_world()
    actors = unreal.GameplayStatics.get_all_actors_of_class(world, actor_class)
    for actor in actors:
        if actor.get_actor_label() == actor_name:
            return actor
    return None

# Retrieve the camera actor by its class and a specific name 'CameraActor'
camera_actor = get_actor_by_class_and_name(unreal.CameraActor, "CameraActor")

# Get the CameraComponent attached to the found camera actor
camera_component = camera_actor.get_component_by_class(unreal.CameraComponent)

# Fetch the camera's view parameters at a given time slice (1/60 second)
cam = camera_component.get_camera_view(1./60.)

# Retrieve the view-projection matrix for the given camera view
mat = unreal.GameplayStatics.get_view_projection_matrix(desired_view=cam)

The variable mat consists of View Matrix, Projection Matrix and View-Projection Matrix. We consider the Projection Matrix to be consisting of the "intrinsic" parameters.

intrinsic_mat = np.array([
    [1.000000, 0.000000, 0.000000, 0.000000],
    [0.000000, 1.777778, 0.000000, 0.000000],
    [0.000000, 0.000000, 0.000000, 1.000000],
    [0.000000, 0.000000, 10.000000, 0.000000]
])

with open(r'Path\to\Folder\image_data.json') as f:
    pose_data = json.load(f)

# Extract quaternion rotation components from the first dictionary in the list
quat = list(pose_data[0]["camera_rotation"].values())

# Convert the quaternion to a rotation matrix using the 'mu' library
# Note: The quaternion order here assumes [w, x, y, z] format for the library function
rot = np.array(mu.Quaternion([quat[3], quat[0], quat[1], quat[2]]).to_matrix())

# Extract the position data and reshape it to a column vector (3x1 matrix)
pos = np.array(list(pose_data[0]["camera_position"].values())).reshape(-1, 1)

# Combine the rotation matrix and position vector into a 4x4 extrinsic matrix
# The rotation matrix is transposed to convert it from world-to-camera coordinate system
# A row of [0, 0, 0, 1] is added to the bottom to make it a proper homogeneous transformation matrix
extrinsic_mat = np.hstack((np.vstack((rot.T, pos.T)), np.array([0, 0, 0, 1]).reshape(-1, 1)))

# 'extrinsic_mat' now represents the full world-to-camera transformation matrix, which can be used
# to transform points from the world coordinate system to the camera's coordinate system.

# Combine the extrinsic and intrinsic matrices to form a single transformation matrix
overall_mat = extrinsic_mat @ intrinsic_mat

# Retrieve the positions of N sockets from the 'pose_data' structure and convert them into a numpy array
# Append a column of ones to the position vectors to form homogeneous coordinates for matrix multiplication
pts_3d = np.hstack((np.array([list(pose_data[0]["sockets"][i]["position"].values()) for i in range(N)]), np.ones((N,1))))

# Apply the combined transformation matrix to the points
temp_pts = pts_3d @ overall_mat

screen_coords = []
for i in range(N):
    temp_pt = temp_pts[i]
    screen_coords.append([int((((1 + (temp_pt[0] / temp_pt[3])) / 2) * 1920) + 0.5), int((((1 - (temp_pt[1] / temp_pt[3])) / 2) * 1080) + 0.5)])
screen_coords = np.array(screen_coords)

The math we followed for the code below is present in the following Link

Challenge

We face issues converting world coordinates of keypoints into pixel coordinates. The output vector from the view*projection matrix multiplication has elements in the range of thousands, rather than the expected -1 to 1 normalized camera space values. Scaling down doesn’t yield reasonable pixel coordinates.

Could you please guide us on how to correctly apply the view and projection matrices to transform world coordinates into pixel coordinates, in Unreal Engine?

Your expertise would be invaluable in overcoming these challenges, as we did not find proper documentation regarding this.

We tried to follow along these resources/forum answers but couldn't project points successfully:

abrahampaul commented 5 hours ago

I am also facing this same issue like @dsriaditya999 @zahrahmed1017 @dsriaditya999 @tpark94 Please help me out in resolving this issue.