How to training object detection tasks

ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models

http://ludwig.ai

Apache License 2.0

11.11k stars 1.19k forks source link

How to training object detection tasks #501

Open QiuSYang opened 5 years ago

QiuSYang commented 5 years ago

if i need training a faster-rcnn model, How can I design my CSV file and yaml file?

w4nderlust commented 5 years ago

This kind of task is not currently supported. There is a WIP PR about single bounding box, you may want to use that as a temporary workaround if you have a single object to detect. Otherwise, adding an output feature for object detection wouldn't be that complicated, you may consider contributing it.

w4nderlust commented 5 years ago

Please leave this open. It's a reminder of a feature request.

QiuSYang commented 5 years ago

This kind of task is not currently supported. There is a WIP PR about single bounding box, you may want to use that as a temporary workaround if you have a single object to detect. Otherwise, adding an output feature for object detection wouldn't be that complicated, you may consider contributing it.

@w4nderlust hello，If I'm doing a multi-object detection task, can I just add an output feature to solve the problem? How should I set up the RPN network and ROI pooling layer? Do I need to add new encoder and decoder modules?

w4nderlust commented 5 years ago

@QiuSYang thank you for your interest in working on this! So the first step for this I believe would be to create a new type of feature, something like BoundingBoxSet or Bounding Boxes. There is an open PR for a single bounding box, you can take inspiration from that i believe: https://github.com/uber/ludwig/pull/344 Once that is tested to work, What I would do is to take a model, like Faster-RCNN for instance, split it into an encoding part and a decoding part, the decoding part goes becomes one of the optional decoders of the BoundingBoxSet output feature, while the encoding part goes into a new encoder for the Image input feature. I can help you out with this, I'm going to be traveling next week to present at conferences, but when I'm back I can take a stab at how to split the model into encoder and decoder so that it would be easier for you. How does this sound?

QiuSYang commented 5 years ago

@QiuSYang thank you for your interest in working on this! So the first step for this I believe would be to create a new type of feature, something like BoundingBoxSet or Bounding Boxes. There is an open PR for a single bounding box, you can take inspiration from that i believe: #344 Once that is tested to work, What I would do is to take a model, like Faster-RCNN for instance, split it into an encoding part and a decoding part, the decoding part goes becomes one of the optional decoders of the BoundingBoxSet output feature, while the encoding part goes into a new encoder for the Image input feature. I can help you out with this, I'm going to be traveling next week to present at conferences, but when I'm back I can take a stab at how to split the model into encoder and decoder so that it would be easier for you. How does this sound?

That would be great. Thank you very much

w4nderlust commented 4 years ago

Just adding more context: This Issue is for single object detection https://github.com/uber/ludwig/issues/331 This WIP PR start implementing it: https://github.com/uber/ludwig/pull/344

gustavorps commented 3 years ago

Any updates about this feature?

w4nderlust commented 3 years ago

@gustavorps unfortunately no, we have focused on other aspects in v0.3 and are now focusing on data preprocessing for v0.4. This feature will likely come in v0.5, but we would gladly accept contributions for this, which may accelerate the process.