restructure images/ghosts/masks hierarchy

kukuruza commented 9 years ago

It is the 2nd step of issue #28

Rationale:

want to have a single file as an image archive, so that's its easier to sync with google drive and the server
want to change masks, ghosts, or images datasets for a .db easily

Cons:

it will be human-unfriendly (to see a few examples). Can write a small tool to browse
dataset hierarchy will become inconsistent with labelme hierarchy

Steps. It looks like the task can be decomposed.

remove ghostfile column. Split each db into images and ghosts
add sets table into the database structure. Write paths to image folders into there. Reference individual images
find a way to compress directories efficiently

kukuruza commented 9 years ago

HDF5 in Matlab and Python does not currently support JPEG compression. Because of this, it can't be used for step 3. However, LMDB in both Matlab (for UNIX here: https://github.com/kyamagu/matlab-lmdb) and Python do support JPEG compression. So may use LMDB for step 3.

kukuruza commented 9 years ago

Took me 5 more hours to streamline the data flow (separate background, ghosts, sparse-bboxes)

kukuruza commented 9 years ago

Important comparison between HDF5 and LMDB (caffe-made): https://github.com/NVIDIA/DIGITS/issues/224 LMDB wins a lot in compression, hence, size.

kukuruza commented 9 years ago

Images from all datasets can now be read directly from the source videos (the original video, the video with ghosts and the video with masks).

Advantages of video: 1) No need to duplicate a video to a folder with the same images (that's saved GBs now) 2) Satwik's high fps videos just can't be written as a bunch of jpegs at all -- they'd take too much space 3) Now syncing with Google drive / Box / Dropbox is fast -- instead of ~50000 files, we now have ~500.

Disadvantages of video: 1) Backend code (the one you mostly don't see) became more complicated 2) Random access for writing can't be implemented (never had to use it before though) 3) Masks have to be stored as lossy Motion-JPG, instead of lossless .png. Now need to threshold JPG artifacts by mask = mask > 127;

Only part of the code was changed (commit 59b1e3905095d22b9920c46715013ccf9626587e). The rest wasn't changed at all and should work. Folder data/datasets is only for the old code and can be removed sometime laster. Let me know of any bugs please.

kukuruza commented 9 years ago

I did not hear about any problems, so I'm closing this. Also please avoid using video2image, let's work with video directly when possible

kukuruza / City-Project

restructure images/ghosts/masks hierarchy #33