nutonomy / nuscenes-devkit

The devkit of the nuScenes dataset.
https://www.nuScenes.org
Other
2.24k stars 617 forks source link

Why don't use MongoDB or SQL to manage the data? #928

Closed nnop closed 1 year ago

nnop commented 1 year ago

Thanks for the work on providing such an awesome AD dataset. The schema is just a database design. But, why don't use MongoDB or SQL to manage the data? The data query world be quite easy.

Qiang-Xu commented 1 year ago

Hey @nnop, this is a great question. We started with a image dataset internally, it was heavily inspired by COCO dataset, basically using JSON to represent a relational schema. Then we began to build fusion datasets and we basically took the same method since our developers were already familiar with the setup and not to mention there was code there which could be reused.

In general, I think JSON isn't a bad choice because

There are developers from various background: robotic, ML, engineering, etc, I do think this serves everyone well with very little learning curve.

There are downsides as you may know too:

Internally, we've switched to SQLite(which is also portable) late in 2018 and there is an ORM layer(SQLAlchemy) on top it so that our developers don't bother with SQL most of the time. Unfortunately, nuScenes started before that, we didn't get the time/bandwidth to make it to nuScenes.(nuPlan is a modified version of what we have internally today but relational schema isn't very scalable for huge datasets)

nnop commented 1 year ago

Thanks for sharing your experience. It's great to know your thoughts and what you guys have done on managing those large relational dataset. I agree with you. It's better for sharing and access for academic people. I''ve had a look at the ORM layer you implemented in nuPlan. That's a great referernce for implementing my own wrapping layer.