Do we really need a separate opencv-demonstrator-data repository?

valera-rozuvan commented 8 years ago

When I created the opencv-demonstrator-data [link] repository that contains the entire contents of the folder ocvdemo/data, I was thinking along the following lines:

We need to separate the code and the assets (images, and later videos) into separate packages. Because while the images (and videos) will most likely not change that much with time, the code will. So, on a Linux-based OS, if we have separate packages for assets and compiled code (program), you will not have to download over and over the assets each time you update the program with a package manager. You will just need to upgrade the opencv-demonstrator package.
When you clone the opencv-demonstrator GitHub project, you only get the source code and the cloning time is short.

However, I didn't take into the account that the ocvdemo/data folder also contained .xml files for the MMI (basically the GUI layout configuration). Those are required to build the project. So we need to bring back all of the .xml files into opencv-demonstrator repository.

What about the image assets? Keep them separate in another GitHub repository, and introduce a new folder called assets for them? Or sack the whole idea of a separate repository for the images (and later videos), and bring back the entire data folder how it was initially?

So we need to decide between 3 possible options.

Option 1

Bring back the ocvdemo/data only with the .xml files. The images will stay in a separate GitHub repository called opencv-demonstrator-assets, and will go to the folder ocvdemo/assets for the complete build. One good point about this approach brought up by @juliena82 in an e-mail discussion is this:

Still, I think that your idea to have a assets repository is a good idea and
should be kept!

In fact, we could store here the original big data files that are candidate to
be used for the demonstrator (before being downsampled / reduced / recoded
in the ocvdemo/data final folder).

This way, we could have always access to the original data images or
videos (even big).

Option 2

Bring back the ocvdemo/data folder completely. I.e. like it was in the beginning - all of the .xml files and images come along with the sources in the ocvdemo/data folder. Delete the opencv-demonstrator-data GitHub repository,

Option 3

Like option Option 2, but we make another GitHub repository opencv-demonstrator-assets, and use it as a place to store high resolution images (and later videos). It can be used by package managers to create a separate package for the assets of this project.

So, @shervinemami and @juliena82 - let's vote on the best option to proceed forward! Please leave a comment to this issue with your choice = )

juliena82 commented 8 years ago

As already discussed by mail, I would vote for option 3, with the nuance that we can keep the additionnal repo that Valera has already created instead of creating a new one, and also with the precision that all data necessary for the program (including downsampled version of images) would be in the main repository. For completeness, I include below my last mail:

This is a small project. I have the philosophy to keep the things always as much simple as it can be (and I often fail at it, but I try). Having to manage 2 repositories is an added complexity that is really worth the separation of data and code? Think about a newcomer: he has to do 2 checkouts... Triste Think about us, we will have to do twice more as push as necessary in the future.
Take for example OpenCV. This is not a small project. It is a big project. And even them have only one repository, containing all the source and images and videos for the sample (look at https://github.com/Itseez/opencv/tree/master/samples/data for instance). If the OpenCv developpers have done like this, I think that we should not be ashamed to do the same ;)
You make a point when you speak of the potential problem of the big files. But I see it the other way: having only one repository will impose us to be reasonable on data size. For instance, when I included the HDR demonstration, I took at first the original images, but after I donwsized them and recoded them as JPG, so that they take less size.

So I let you decide, but if you take some time to think at this, I would be very glad...

My credo: always simpler (that's also why I think there is much clean up to do in the sources).

PS :

Still, I think that your idea to have a assets repository is a good idea and should be kept!

In fact, we could store here the original big data files that are candidate to be used for the demonstrator (before being downsampled / reduced / recoded in the ocvdemo/data final folder).

This way, we could have always access to the original data images or videos (even big).

What do you think of this?

valera-rozuvan commented 8 years ago

I also believe that the way to go is with Option 3.

In the main repository opencv-demonstrator we should have all the necessary files for build. Along with down-sampled images (and later videos). We will keep the repository opencv-demonstrator-data and use that to store other image and video assets at their full resolutions.

We can also include a small script in the main repository to update the data folder with the latest and greatest from our separate data repository.

valera-rozuvan commented 8 years ago

@juliena82 I will add back the data folder to the opencv-demonstrator repository tonight. README.md, and the .gitignore files will need to be updated. After that, I will close this issue.

valera-rozuvan commented 8 years ago

Issue resolved in commit bb7177d7cfa18fc2a7d3dab48369b56c886fa106.

shervinemami commented 8 years ago

Wow you guys resolved the problem while I was sleeping (I'm in Australia), that's fast! I agree with how you decided to do it. I'll just quickly mention that actually OpenCV is doing the same approach as this: they have the main "opencv" repo that has everything you need to build OpenCV, but they also have a separate "opencv-testdata" repo with the huge image & video files (houndreds of MB!) that the developers use for unit testing of all the code automatically each night. So I agree that the main repo should have everything needed to build the project, and there can be the large "data" repo just for developers.

tsdconseil / opencv-demonstrator

Do we really need a separate opencv-demonstrator-data repository? #1