tensorflow / model-card-toolkit

A toolkit that streamlines and automates the generation of model cards
https://www.tensorflow.org/responsible_ai/model_card_toolkit/guide
Apache License 2.0
423 stars 84 forks source link

Difficulty in viewing dataset plots that have long text and numerous items #295

Open jeongukjae opened 1 year ago

jeongukjae commented 1 year ago

What happened?

When attempting to render a model card featuring a histogram with very long text (label) and numerous items, I'm facing challenges in effectively visualizing the data.

For instance, let's consider the scenario where I render some string statistics, containing 50 lorem ipsum buckets with numbering, resulting in a model card like the one shown below.

스크린샷 2023-08-02 오후 12 12 56 스크린샷 2023-08-02 오후 12 13 02

The labels are overlapped, and difficult to read.

What is the expected behavior?

Clearer plots.

I think it would be beneficial for the model-card-toolkit to limit on the number of words and items for histogram labels when generating histogram plots.

https://github.com/tensorflow/model-card-toolkit/blob/74d7e6d8d3163b830711b226491ccd976a2d7018/model_card_toolkit/utils/graphics.py#L52-L91

How can we reproduce the problem?

run following code to rerender previous image

from tensorflow_metadata.proto.v0 import statistics_pb2

from model_card_toolkit import model_card
from model_card_toolkit.utils import tf_graphics

lorem = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed efficitur, enim sit amet ultrices malesuada, lorem augue rhoncus quam, sit amet ullamcorper dolor ligula quis est. Sed tempor blandit pharetra. Aenean facilisis eu lacus non molestie. Sed enim turpis, semper vel gravida sed, egestas at lacus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aliquam at libero posuere, dapibus tellus at, aliquet ipsum. Fusce quis ante nec neque interdum mollis mattis vitae ante. Curabitur aliquet enim enim, ac porttitor nibh lobortis nec. Nam id gravida ex. Donec mi magna, fermentum ac pulvinar vitae, cursus vel odio."
feature = statistics_pb2.FeatureNameStatistics()
feature.path.step.extend("string_feature")
feature.type = statistics_pb2.FeatureNameStatistics.STRING
for i in range(50):
    bucket = feature.string_stats.rank_histogram.buckets.add()
    bucket.label = f"{lorem} {i}"
    bucket.sample_count = 1000 + i * 100

feature_stats = statistics_pb2.DatasetFeatureStatistics()
feature_stats.features.add().CopyFrom(feature)
datasets = statistics_pb2.DatasetFeatureStatisticsList()
datasets.datasets.add().CopyFrom(feature_stats)

mc = model_card.ModelCard()
tf_graphics.annotate_dataset_feature_statistics_plots(
    mc, [datasets]
)

mc.render(
    template_path="model_card_toolkit/template/html/default_template.html.jinja",
    output_path="sample/model_card.html"
)

Model Card Toolkit Version

2.0.0

Python Version

3.8.10

Platforms

docker

Relevant log output

No response