kuzeko / graph-databases-testsuite

Docker Images, installation scripts, and testing & benchmarking suite for Graph Databases
https://graphbenchmark.com
MIT License
36 stars 9 forks source link

Embed lids. #6

Open MartinBrugnara opened 6 years ago

MartinBrugnara commented 6 years ago

While it is good to have the samples saved in the repo, it would be better to embed the per database mapping (lids) into the images them selves.

Furthermore, we may automate the procedure for which if the images has not he lids for the current sample_set it just makes lids mapping query and commits itself (well actually the python script does it). Saving the lids as <hash_of_sample>.json (which shall include dataset name or hash, and sample data) would also allow us to store multiple lids in an image and use the appropriate one when needed.

This would avoid further headaches due to commit merging and images migration.

kuzeko commented 6 years ago

I agree the lids in <hash_of_sample>.json should be inside the image, I guess right after loading. We may need to check that 2 images are using the same set of ids, but this is simple if we refer to the sample sample.json file, and we check all samples are in the <hash_of_sample>.json right?

MartinBrugnara commented 6 years ago

It maybe be simpler than that:

We may just have a directory in the image, lets say "/lids/" where we store all the lids ever computed for that image.

The lids file would then been called something like <dataset>-<hash_of_sample_file>.json, in this way we can check if the correct file exists before starting the set of queries, if it does not we execute the mapper query.

This would mean that the loading/sampling step will become two independent step, where the second may be executed more than once. Consider the case where you may want to use different samples.json for the same dataset, like if you are reproducing someone else experiments and comparing with yours, i.e. effectively compering the influence of the sample set.

kuzeko commented 6 years ago

I like the idea, I have the feeling that we are creating some complications somewhere, but I cannot see what/where. So let's try this way.