Open PavlosMelissinos opened 7 years ago
I think there are some valuable discussion points and pain points discussed in the tensorflow datasets api https://github.com/tensorflow/tensorflow/issues/7951, and interesting details on the kinds of transformations people have needed.
the _init_( config)
parameter would probably make the most sense simply being kwargs, which is how python handles all parameters to any function as a dictionary anyway.
Hey there, glad to see you so active! 👍
That tf thread seems handy, I could use the perspective.
the init( config) parameter would probably make the most sense simply being kwargs, which is how python handles all parameters to any function as a dictionary anyway.
Well duh, indeed it makes perfect sense. I don't know why that didn't even cross my mind...
Thanks for the suggestions!
@PavlosMelissinos It might be best to email the keras mailing list or post this in keras itself for more feedback
@PavlosMelissinos is your version of the COCO dataset in a condition that can be merged here? My version isn't that great so it would be nice to pull it in!
@ahundt I've improved it a bit but it still needs a lot of cleaning up and I'm not actively working on it anymore. I want to make it more modular and there are a few things that I don't like. I've also pledged to contribute to a keras dataset project with @DEKHTIARJonathan which I've neglected for months (sorry Jonathan!). You can follow that repo if you want and even offer your own implementation if you have one.
The actual implementation for the time being is still on my enet repo, along with some alternative network architectures but they all have to go.
Ok I actually think I will need to use coco for some stuff and I already use keras-contrib for that so I'd like to just replace the current coco loading script with yours, rather than any of the models or anything. Plus, the coco data format supports other datasets too so it will be super convenient!
In that case feel free to take anything you want from my repo and modify it to your needs, I don't think I'll be doing any work on it for the next few days/weeks.
Oh, on a side note, I originally preferred using json files to define dataset parameters but not anymore. One of the first things I'd change now is turn it into code ("hard-coded" object constants) to remove a layer of (unnecessary) abstraction & complexity.
Yeah I've found that "hard coding" is actually the way to go in python, as long as you separate the APIs so that you can change the settings programmatically too. Essentially python becomes your programming language and your config file language, then your commits naturally keep track of exactly what you ran. :-)
Hey there, so I've been silently working on a few ideas I've had about a sensible dataset format and I'm currently looking for some feedback before I open a PR here. Could someone check out my version of the MS-COCO dataset and tell me what I would need to add/change in order to make it worthy of a merge?
For instance:
I'm especially interested in your opinions regarding my use of a configuration file instead of feeding raw parameter values through code. I prefer my way visually and I think it's more readable but on the other hand it might be harder to debug.
Thanks a lot, I'm looking forward to your feedback!
EDIT: My code is for semantic segmentation but I've intentionally kept the title vague, this discussion can accommodate any kind of task.