Decouple source data from the detection/classification messages.

Following the discussion in #43, mainly:

Decouple from data: Some messages here include a field carrying some source data. I understand it is very practical on one hand (we do it too in our custom messages), but on the other hand it generates issues like #20 (and possibly future issues like sensor_msgs/PointCloud support etc.). And these have no good solution with the data inside the message except for composition on a higher level. I propose to remove the source data fields from vision_msgs, and possibly leave a note in the message definition that the user should either use a message synchronizer to get the pairs of image-detection, or create a custom type that contains both the image (cloud...) and the detection message. It's easy to use topic_tools/relay_field to extract the desired parts afterwards. There are situations where you do not want to relate the detection to the whole image, but you want to e.g. have the cutout from the image where the detection occured. This would work the same in case 1 detection per image is published (you'd just publish the cutout on a new topic). But it would become a bit more complicated in case the detector can return multiple detections per image and you would use DetectionArray. Here, matching by header would no longer be enough. Or - if taken rigorously - it could be, because making a cutout from an image formally invalidates its frame_id. So each cutout should have its own frame_id, and then you could make the detection-image mapping. Some more rationale: someone might want to run detections on an organized 3D pointcloud from a depth camera and return 2D detections (because he's e.g. only interested in bounding boxes in the RGB in the end). And a practical issue we faced: on a low-bandwidth link, we wanted to send detections, but as I mentioned earlier, the image is already a part of our messages (similarly to how it is done here now). But we figured out we do not want to transmit the image over the slow link. So we ended up not filling any values in the image fields, and now we have detections which lie to be detected on 0x0 px images. That doesn't really seem like a conceptual approach. Still, I'm not really 100% sure on this point, but it seems to me decoupling is the "more correct" way, but has its pros and cons. Bad thing is that using composition to create a DetectionsWithImagesArray message would not subsequently allow to use topic_tools/relay_field, because you would need to relay primitive-array fields, which I guess is not supported.

This PR drops dependency on sensor_msgs.

ros-perception / vision_msgs

Decouple source data from the detection/classification messages. #53