plotly / plotly.py

The interactive graphing library for Python :sparkles: This project now includes Plotly Express!
https://plotly.com/python/
MIT License
15.98k stars 2.53k forks source link

make generated plot HTML reproducible #3393

Open achimgaedke opened 2 years ago

achimgaedke commented 2 years ago

The HTML/js contains UUIDs generated by python's built-in uuid.uuid4. By nature, those are unique and totally random.

But if you generate documents as part of a pipeline (e.g. via https://dvc.org/ ) then it would be great to have them reproducible down to the byte.

import plotly.express as px

fig =px.scatter(x=range(10), y=range(10))
fig.write_html("file1.html")
fig.write_html("file2.html")

The files will be identical except from the UUIDs.

I "solved" this problem by monkey-patching the relevant function:

# monkey patch
import uuid
import random

uuid4_rnd = random.Random(0)
def uuid4_seeded():
    """Generate a random UUID using random.Random"""
    return uuid.UUID(bytes=uuid4_rnd.randbytes(16), version=4)

uuid.uuid4 = uuid4_seeded

fig =px.scatter(x=range(10), y=range(10))
fig.write_html("file.html")

re-executing this snippet will produce identical files.

It would be great to exchange one function (or set a seed for an internal version of uuid) in plotly instead of hijacking the python built-in.

achimgaedke commented 2 years ago

A minimal implementation would exchange the two uuid.uuid4 calls in the plotly.py code base with

import random
uuid.UUID(bytes=random.randbytes(16), version=4)

This will allow random.seed(42) to do the trick.

gvwilson commented 3 weeks ago

see #1968