Self contained testing.mock_data for TFDS datasets

tensorflow / graphics

TensorFlow Graphics: Differentiable Graphics Layers for TensorFlow

Apache License 2.0

2.75k stars 366 forks source link

Self contained testing.mock_data for TFDS datasets #351

Open taiya opened 4 years ago

taiya commented 4 years ago

Currently, storing dataset info in tensorflow_graphics/datasets/testing/metadata breaks the "self-contained" style we are following for TFDS datasets stored within TFG.

We should modify the logic of the mock run so that tensorflow_graphics/datasets/testing/metadata/model_net40/1.0.0/dataset_info.json can be stored in something like tensorflow_graphics/datasets/model_net40/dataset_info.json.

CC'ing @rsepassi so he can provide some pointers on how to fix this?

rsepassi commented 4 years ago

cc @Conchylicultor

More info here would be helpful.

Is it that you'd like for the filepath to be different? What do you mean by the "self-contained" style? What's the "mock run"?

taiya commented 4 years ago

In the main TFDS repo, checksums are stored in tensorflow_datasets/checksums, while here we are storing it in the dataset folder, e.g. tensorflow_graphics/datasets/modelnet40/checksums.

That is, in TFG, all of the files for a dataset are contained within its folder (and subfolders). I interacted with Etienne to getting this workflow up and running a few weeks ago.

@jackd now also added tfds.testing.mock_data in #310, but doing so requires: 1) the creation of a dataset_info.json (which is not great for contributors, but "ok" for now) 2) the dataset_info.json for all datasets to exist within a single folder (for all datasets); in our case we are using tensorflow_graphics/datasets/testing/metadata

rsepassi commented 4 years ago

Ok, so sounds like you and @Conchylicultor have discussed this previously; what was the conclusion of the discussion, or work that was considered/started? (If there's a related GitHub issue, please link; also Etienne can chime in here).

taiya commented 4 years ago

No, only discussed with Etienne, not @Conchylicultor.

There are two separate things that would be nice: 1) give the user the ability to choose where the metadata is stored (similar to tfds.download.add_checksums_dir(_CHECKSUM_DIR), I guess) 2) allow a mock_data workflow where the data is programmatically define within the test file (so not to have to depend on deployed data)

Feel free to reach out on GVC if you want to discuss in details ;)

Conchylicultor commented 4 years ago

For info, @Conchylicultor == Etienne,

I agree about those two points. I tried to answered by mail with more context. I can try to look at 1 by the end of the week/early next week.