Using video frame as an input

Thank you for the work on this awesome package! Super elegant and clean.

I have a question about the best usage of this package.

My goal would be to train a model to fly towards the targate using only video stream and drone sensor values as an input. Without relying on coordinates. So I would have a video with an intermediate model that does object detection/tracking and the drone would need to fly towards that detected object.

Is it possible to take a video frame, apply object detection, and use it as an input for the model? What is the best to implement it using this repo?

And a side question about integration with betaflight, to train a model that uses the realistic commands from betaflight's SITL. What do you think the best approach is to do it?

Hi! Thanks for the kind words and interest!

Yes, that's definitely possible. Have a look at this repo for an idea on how to perform object detection/tracking using UAVs. Basically what I do is train an RL agent using segmentation maps retrieved in the simulation using this line, and then train a separate object detection model using real world images to learn those segmentation maps.

The entire runtime pipeline would look something like:

Real images -> CV model -> segmentation map -> RL model -> flight

And during training, we have two separate training regimes:

Real images -> CV model -> segmentation map
Simulation -> segmentation map -> RL model

For more details on how everything is tied together on a real UAV, I'm currently doing a project using Dronekit (it's a bit old but less cumbersome than ROS/PX4 for small systems) with the code here. A good place to start would probably be in this file.

Alternatively, you could also go the full stack route, where you train the RL agent directly in the simulation using domain randomization. I attempted this in the past, but found it too hard to get working in real life since my real world environment is much more complex than what can be feasibly designed in Bullet under realistic timelines. The code is here should you be interested, but note that the repo is horrifically old and predates PyFlyt by many moons.

I'm not too familiar with Betaflight SITL (I've only used it out of the box for FPV drones), but I would imagine if you can get the PWM signals out, you can just pass them to the underlying drone using set_mode(-1) which are direct commands to motors, or other modes if you wish. Unfortunately, PyFlyt doesn't publish raw gyro and accelerometer data (though this would be an add-able feature) so I'm not sure what kind of sensor data you will be passing to SITL.

For ideas about passing data between separate pieces of software, you could use ROS (and relevant rospy packages) if you're passing around lots of data, or simply use zmq (pyzmq package in Python) if it's just a few numpy arrays.

Let me know if you have any other questions, I'd be happy to assist wherever.

jjshoots / PyFlyt

Using video frame as an input #13