Potential performance improvements?

pfbuxton commented 5 years ago

I have profiled a simple heatmap here:

profile_code.py

from werkzeug.contrib.profiler import ProfilerMiddleware
from app import server

server.config['PROFILE'] = True
server.wsgi_app = ProfilerMiddleware(server.wsgi_app, restrictions=[30])
server.run(debug = True)

app.py

import numpy as np

import dash
import dash_core_components as dcc
import dash_html_components as html
import plotly.graph_objects as go

import flask

# Heatmap
Z = np.random.rand(1000,1000)

server = flask.Flask(__name__)
app = dash.Dash(__name__, server=server)

app.layout = html.Div(children=[
    dcc.Graph(
        id='example-graph',
        figure=dict(
            data=[go.Heatmap(
                z=Z
            )],
            layout=dict()
        )
    )
])

Result (Python 3.7 Windows):

         1068 function calls (1060 primitive calls) in 4.032 seconds

   Ordered by: cumulative time
   List reduced from 243 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    4.032    4.032 C:\Python37-32\lib\site-packages\werkzeug\contrib\profiler.py:95(runapp)
        1    0.000    0.000    4.032    4.032 C:\Python37-32\lib\site-packages\flask\app.py:2262(wsgi_app)
        1    0.000    0.000    4.032    4.032 C:\Python37-32\lib\site-packages\flask\app.py:1801(full_dispatch_request)
        1    0.000    0.000    2.473    2.473 C:\Python37-32\lib\site-packages\flask\app.py:1779(dispatch_request)
        1    0.002    0.002    2.472    2.472 C:\Python37-32\lib\site-packages\dash\dash.py:467(serve_layout)
      2/1    0.013    0.006    2.462    2.462 C:\Python37-32\lib\json\__init__.py:183(dumps)
        1    0.002    0.002    2.451    2.451 C:\Python37-32\lib\site-packages\_plotly_utils\utils.py:35(encode)
        2    0.000    0.000    1.995    0.998 C:\Python37-32\lib\json\encoder.py:182(encode)
        2    1.856    0.928    1.980    0.990 C:\Python37-32\lib\json\encoder.py:204(iterencode)
        1    0.000    0.000    1.559    1.559 C:\Python37-32\lib\site-packages\flask\app.py:1818(finalize_request)
        1    0.000    0.000    1.559    1.559 C:\Python37-32\lib\site-packages\flask\app.py:2091(process_response)
        1    0.000    0.000    1.559    1.559 C:\Python37-32\lib\site-packages\flask_compress.py:78(after_request)
        1    0.000    0.000    1.557    1.557 C:\Python37-32\lib\site-packages\flask_compress.py:113(compress)
        1    0.001    0.001    1.553    1.553 C:\Python37-32\lib\gzip.py:247(write)
        1    1.535    1.535    1.535    1.535 {method 'compress' of 'zlib.Compress' objects}
        1    0.000    0.000    0.452    0.452 C:\Python37-32\lib\json\__init__.py:299(loads)
        1    0.000    0.000    0.452    0.452 C:\Python37-32\lib\json\decoder.py:332(decode)
        1    0.452    0.452    0.452    0.452 C:\Python37-32\lib\json\decoder.py:343(raw_decode)
        4    0.000    0.000    0.124    0.031 C:\Python37-32\lib\site-packages\_plotly_utils\utils.py:66(default)
        1    0.000    0.000    0.093    0.093 C:\Python37-32\lib\site-packages\_plotly_utils\utils.py:123(encode_as_list)
        1    0.093    0.093    0.093    0.093 {method 'tolist' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.017    0.017 C:\Python37-32\lib\site-packages\_plotly_utils\utils.py:131(encode_as_sage)
        3    0.000    0.000    0.017    0.006 C:\Python37-32\lib\site-packages\_plotly_utils\optional_imports.py:15(get_module)
        1    0.000    0.000    0.016    0.016 C:\Python37-32\lib\importlib\__init__.py:109(import_module)
      2/1    0.000    0.000    0.016    0.016 <frozen importlib._bootstrap>:994(_gcd_import)
      2/1    0.000    0.000    0.016    0.016 <frozen importlib._bootstrap>:978(_find_and_load)
      2/1    0.000    0.000    0.016    0.016 <frozen importlib._bootstrap>:948(_find_and_load_unlocked)
        1    0.000    0.000    0.016    0.016 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
        1    0.000    0.000    0.016    0.016 <frozen importlib._bootstrap>:882(_find_spec)
        1    0.000    0.000    0.016    0.016 <frozen importlib._bootstrap_external>:1272(find_spec)

Result with Phython 2.7 linux are almost identical

Looking through the profiling it looks like the main causes is creating the JSON, with C:\Python37-32\lib\site-packages\_plotly_utils\utils.py taking 2.45s out of a total of 4s. I know that orjson (only Python 3) can be faster than Python's default JSON. Would you expect that changing to orjson would improve performance / be possible to implement?

Thanks for any insight.

pfbuxton commented 5 years ago

Looking more into this - this part where the JSON is created, loaded and dumped again to convert NaN to null: https://github.com/plotly/plotly.py/blob/4a63c3a6a7de9b56336c2b27b9d121fc27b1d4bb/packages/python/plotly/_plotly_utils/utils.py#L35-L64

I tried a very unsafe method:

encoded_o = super(PlotlyJSONEncoder, self).encode(o).replace('NaN', 'null')
return encoded_o

and found the time wend from 4 seconds down to 2.5 seconds, so it looks like there is the potential for good performance improvements?

(this method is unsafe because if you wanted to have NaN in any part of the page, then it would be converted to null)

pfbuxton commented 5 years ago

A safe (and fast!) solution is to make the object JSON compliant when converting it to a list, here:

    @staticmethod
    def encode_as_list(obj):
        """Attempt to use `tolist` method to convert to normal Python list."""
        if hasattr(obj, "tolist"):
            if isinstance(obj,np.ndarray):
                if obj.dtype=='float64'  or  obj.dtype=='float32':  # need to add more data types, e.g. integers
                    obj_json_compliant = np.where(np.isnan(obj)+np.isinf(obj) , None, obj) # Remove nan's and +/- infinity
                    return obj_json_compliant.tolist()
                else:
                    return obj.tolist()
            else:
                return obj.tolist()
        else:
            raise NotEncodable

(you have to be a bit careful with the numpy types as text is allowed to have nan and inf's).

Then you would simply do a single return (no re-loading JSON and re-exporting):

encoded_o = super(PlotlyJSONEncoder, self).encode(o)
return encoded_o

Would it be possible to implement my solution?

gvwilson commented 5 months ago

Hi - we are trying to tidy up the stale issues and PRs in Plotly's public repositories so that we can focus on things that are still important to our community. Since this one has been sitting for several years, I'm going to close it; if it is still a concern, please add a comment letting us know what recent version of our software you've checked it with so that I can reopen it and add it to our backlog. Thanks for your help - @gvwilson

plotly / plotly.py

Potential performance improvements? #1842