The units are based on whatever you provide for the tag_size argument to Detector.detect. So if you have a 2cm tag and you use tag_size=2 then the units are cm; if you use 0.02 then your units are meters.
x&y are in the plane of the tag, z is moving perpendicular to the plane
it is in the coordinate frame of the camera. I found the cv2 tutorial on pose estimation helpful for understanding how to move from camera coordinate frame to a more interpretable world coordinate frame.
pose_t = [[-0.16225682] [-0.20048863] [ 1.01195728]]
for simplicity x = -0.16225682 y = -0.20048863 z = 1.01195728