Open Conchylicultor opened 4 years ago
hi, is this issue still open can I work on this?
I think this could still be worked on. As a first step, I think we could add a standalone script/function generate_statistics(builder, dst_dir)
which generate the statistics and save them in dst_dir / statistics.json
.
This way we don't add any more complexity in the core API. Then we can think more about better exposing the statistics directly inside the core API.
Currently, there is an option to generate TFDV statistics for FACET during
download_and_prepare
. However, it would be best to separate those two steps.Then we should add a new script in
scripts/generate_statistics.py
which generate statistics for one dataset and export them togs://
. We would automatically generate statistics for all datasets. When user pre-load theDatasetInfo
, it would also fetch thestatistics.json
from GCS, so user could visualize FACET for all default TFDS datasets with the existingtfds.show_statistics
.