pedropro / TACO

🌮 Trash Annotations in Context Dataset Toolkit
http://tacodataset.org
MIT License
601 stars 201 forks source link

Annotate the 3111 others images #13

Closed hern4ndes closed 4 years ago

hern4ndes commented 4 years ago

Hi @pedropro Fantastic project, I would like to be able to collaborate with the annotation of the other 3000 images, but the online annotation system present on the site is a little slow, I think that I can do it faster using semantic segmentation and then use edge filter algorithms to do instantiation segmentation or use your own trained model to write down something similar to semi-supervised learning. for that, I implemented a requisition system that asks the server of your site for a new image and I get the request return and save that image, but as the return is always a random image I don't get the 3111 images, if you could make it available right here on git the images not yet stained I could collaborate with the annotation I have been working with garbage detection since last year, but due to lack of time I was unable to make a dataset as good as what was done for you, congratulations for the excellent work

pedropro commented 4 years ago

Most images are hosted by Flickr. You can find them using #tacodataset https://www.flickr.com/search/?text=tacodataset It's pretty easy to create a script to get these but you need to create an account and use your own key. https://www.flickr.com/services/api/

hern4ndes commented 4 years ago

thank you very much @pedropro , is it interesting for the project to have the annotation in the pascal-voc format?

mtourne commented 4 years ago

@hern4ndes I'm also interested in doing something like this! and I've also asked Pedro directly for all the image url's before

Instead of waiting, I just wrote a puppeteer script to try to crawl the annotate webpage and slurp all 3000+ remaining urls. I see a lot of flicker and also a lot of AWS hosted ones I'll share them here when I have a list of urls

mtourne commented 4 years ago

Here is the script I used to recover some of the images along with 802 images in a json file over time it gets harder and harder (and wasteful!) to get a random image from the /annotate endpoint that I haven't already seen

https://gist.github.com/mtourne/21c9ee6d1a55b8b2c5972607eae8aaa5

It would be a lot simpler if @pedropro is keen to share the original list :)

pedropro commented 4 years ago

Upon reflection. I decided to make these public. If you annotate these using other tools, please consider make them public and let me know to acknowledge your work: https://github.com/pedropro/TACO/tree/master/data

Check the Readme again for meta.