thtrieu / darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices
GNU General Public License v3.0
6.13k stars 2.08k forks source link

Image and Annotation File Structure for own Training #281

Open johaq opened 7 years ago

johaq commented 7 years ago

Hi, I may be blind and just missing the obvious but is there a tutorial explaining in what format images and their annotations have to be? Darknet says: fu.png in train list, provide fu.txt with content classId xmin ymin width height

Is there an equivalent for Darkflow?

johaq commented 7 years ago

Ok I found this:

<annotation>
    <folder>VOC2007</folder>
    <filename>000001.jpg</filename>
    <source>
        <database>The VOC2007 Database</database>
        <annotation>PASCAL VOC2007</annotation>
        <image>flickr</image>
        <flickrid>341012865</flickrid>
    </source>
    <owner>
        <flickrid>Fried Camels</flickrid>
        <name>Jinky the Fruit Bat</name>
    </owner>
    <size>
        <width>353</width>
        <height>500</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>dog</name>
        <pose>Left</pose>
        <truncated>1</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>48</xmin>
            <ymin>240</ymin>
            <xmax>195</xmax>
            <ymax>371</ymax>
        </bndbox>
    </object>
    <object>
        <name>person</name>
        <pose>Left</pose>
        <truncated>1</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>8</xmin>
            <ymin>12</ymin>
            <xmax>352</xmax>
            <ymax>498</ymax>
        </bndbox>
    </object>
</annotation>

Are the tags owner and source necessary? And is the the content of folder a relative path? What does the attribute depth in tag size mean?

Matsam195 commented 7 years ago

From my own experiments, only the filename, size, segmented, and object tags are necessary for annotation files. As far as I know, depth is not important, so you can just fix it to 1.

jubjamie commented 7 years ago

I made a script today to convert my dataset to what is required for Darkflow and after analysing the code (and testing) this is all you need. I did have some trouble with filenames and regex errors so I had to just number all my files like the VOC dataset. See #284 and my answer there for a bit more info.

<?xml version="1.0"?>
<annontation>
<folder>images</folder>
<filename>10.jpg</filename>
<size>
<width>450</width>
<height>328</height>
</size>
<object>
<name>pig</name>
<bndbox>
<xmin>19</xmin>
<ymin>84</ymin>
<xmax>144</xmax>
<ymax>236</ymax>
</bndbox>
</object>
</annontation>

p.s. I don't actually think that folder is required but I didn't dig far enough to verify that so left it in. Happy to help further as i've spent all day trying to understand it myself. And finally have!

abagshaw commented 7 years ago

@jubjamie Pretty sure the folder is not required as you already specify it in the flags with --dataset to specify where your images are to be found.

jubjamie commented 7 years ago

Yeah I don't think it is either. Oh well it's in now. it's only 2 lines of code :p

@johaq you can have a look at this tool here to create VOC XML files. I've not tried it myself but it looks like it should do the job. I will be trying it out soon!

https://github.com/tzutalin/labelImg

johaq commented 7 years ago

@jubjamie Thanks a lot for the info! I also wrote a script that converts my current bounding box files into xmls. I appear to also have the filename error you described so i will rename my files.

I think it would be nice to be able to have images in different folders. Like class1/images class2/images etc. If I understood correctly I have to have all files in the same folder.

jubjamie commented 7 years ago

I believe that you do need them in the same folder however you may be able to re-write the filename to be within a directory and relative to the path set by --dataset. But you're annotation will need to change.

Also remember that your image must be annotated with boxes and those annotations can contain multiple classes for one single image. Having images separated by class in this instance is not the most sensible solution.

aayush-k commented 7 years ago

@johaq could we use your script for converting the bounding box files to xmls? Thank you so much!

johaq commented 7 years ago

@aayush-k https://github.com/CentralLabFacilities/object_recognition/blob/master/scripts/darknet_to_darkflow.py

This is the script I used to convert darknet bounding boxes to darkflow. If you use the debug flag it shows you the image region of the new bbs so you should be able to check if the conversion was successful.

Edit: of course you have to make some ajustments like image size and your class labels. You might also need to adapt either your file structure or the way the script parses yours.

srafay commented 6 years ago

I am a bit confused about one thing. I have 10,000 images of 3 different types of doughnuts, and i want to train my model on it. do i need to make annotations for all those images (10,000 times for each doughnut) using https://github.com/tzutalin/labelImg ? @johaq @jubjamie @Matsam195

johaq commented 6 years ago

I'm afraid yes. Unless you have some information about how the images were taken (i.e. the doughnut always at roughly the same place).

srafay commented 6 years ago

This is sad, it is kinda impossible you know.. Couldn't i just place my pictures in directories like "train/cats" for Cats and "train/dogs" for Dogs and the model automatically learns to classify them? @johaq

johaq commented 6 years ago

Well, if you are not interested in detecting where the doughnuts are in the image but just which doughnut is somewhere in the image then do not use yolo. If you need that information then you will need to provide it during training. There is a reason google lets you fill out all these captchas because getting labeled data is not easy.

srafay commented 6 years ago

Yes, i am only interested in detecting the types of doughnuts present in the image and not where in the image. Can you guide be a bit what should I use instead of Yolo. Reference image: https://thumbs.dreamstime.com/z/donuts-box-full-doughnuts-half-dozen-47805355.jpg Sorry for asking too many questions, i am a newbie :c @johaq

johaq commented 6 years ago

Do all pictures look like this, just different combinations?

srafay commented 6 years ago

I can get images of each type of doughnut if it's necessary (like only 1 type of doughnut in the image) for training the model. check this out : https://static1.squarespace.com/static/56def1fc7c65e4eeff27787e/56e17a03c6fc0827ec7c7041/56e18e8327d4bd0fa9a0ca99/1507054854611/FullSizeRender+%2843%29.jpg?format=300w But in the end, i want to run my model on the images like this https://thumbs.dreamstime.com/z/donuts-box-full-doughnuts-half-dozen-47805355.jpg

johaq commented 6 years ago

Is it required to use neural networks? Because this seems like something "traditional" approaches, i.e. SVM should be able to handle with less training effort.

srafay commented 6 years ago

No it's not a requirement to use Neural Nets, i just thought it would give me more accuracy. If SVM does the job, then i am fine with it.

johaq commented 6 years ago

It is really hard to say what is more accurate before trying it out. A possible NN approach I am familiar with is Overfeat but I'm probably not up to date with the state of the art approaches there. But if you do not have an incredibly large doughnut selection with very nuanced differences then I think SVM (or something else) should work out well.

srafay commented 6 years ago

Thanks for guiding me, much appreciated. And one last question: If i ever need to use DarkFlow and need to provide Annotations, which tool should i use? https://github.com/tzutalin/ImageNet_Utils OR https://github.com/tzutalin/labelImg What's the difference between the two. @johaq

johaq commented 6 years ago

If you are working on your own and value your time then none of those. In my experience it is just too much work to record and annotate the amount of data necessary manually. What we did is applying a tracker during recording of data so we could save the images and bounding boxes together. We also set up a lazy susan to automate recording. That allowed us to record lots of annotated images in a reasonable amount of time and still the results were not very good. If you look at the amount and quality of data used for these pre trained nets that is just something you cannot achieve on your own. Since your application is not nearly as complex as the imagenet challenge you could probably make it work with a lot less time investment. I would be very interested to see your results if you maybe just annotate 100 images and train with those. Since your images do not have different backgrounds, different scalings of the doughnuts or the doughnuts from different sides and angles this might be enough. My advice for you is to just try things in smaller scales.

srafay commented 6 years ago

Alright thanks, i will try to annotate 100 images and train the model on it and then see the results.

srafay commented 6 years ago

Sorry one last question: The training images need to be of the same resolution (like each image of 600x550) or they can be of different resolutions.. @johaq

johaq commented 6 years ago

Different is fine. Just make sure your annotation file correctly states the resolution in the size tag.

srafay commented 6 years ago

I have decided to test it out with 600 images (7 classes) for now, the dataset and the annotations are ready. But i need guidance in this : https://github.com/thtrieu/darkflow/issues/420 @johaq

ChiragSoni95 commented 6 years ago

Hello, How can I get the bounding boxes of each object in an image, for example if I have a document layout image and I want to extract the bounding box co-ordinates of table, image, section etc , What should I do? Is the script available?I want to create a XML file like one that is available for PASCAL-VOC dataset like the one @johaq mentioned at the top.

johaq commented 6 years ago

@ChiragSoni95 I dont understand. You have an already annotated image that you just need to bring into the correct format or are you looking for a way to create annotations? Your question makes me think you are looking for the latter. If so use a tool like this: https://github.com/puzzledqs/BBox-Label-Tool and change the code so it saves the annotations in the format mentioned above

Edit: I just saw that @jubjamie already recommended this: https://github.com/tzutalin/labelImg Maybe have a look at both and see which one works for you

absognety commented 5 years ago

so to just understand it correctly, annotation file format is .xml only correct? or do we convert them into csv and then csv into tfrecord. I want to use this command for training

python flow --model cfg/tiny-yolo-voc-3c.cfg --load weights/tiny-yolo-voc.weights --train --annotation train/Annotations --dataset train/Images --gpu 1.0 --epochs 300 

Please confirm

INF800 commented 5 years ago

why do we use <difficult><\difficult>