Closed pfbuxton closed 5 months ago
Looking more into this - this part where the JSON is created, loaded and dumped again to convert NaN to null: https://github.com/plotly/plotly.py/blob/4a63c3a6a7de9b56336c2b27b9d121fc27b1d4bb/packages/python/plotly/_plotly_utils/utils.py#L35-L64
I tried a very unsafe method:
encoded_o = super(PlotlyJSONEncoder, self).encode(o).replace('NaN', 'null')
return encoded_o
and found the time wend from 4 seconds down to 2.5 seconds, so it looks like there is the potential for good performance improvements?
(this method is unsafe because if you wanted to have NaN in any part of the page, then it would be converted to null)
A safe (and fast!) solution is to make the object JSON compliant when converting it to a list, here:
@staticmethod
def encode_as_list(obj):
"""Attempt to use `tolist` method to convert to normal Python list."""
if hasattr(obj, "tolist"):
if isinstance(obj,np.ndarray):
if obj.dtype=='float64' or obj.dtype=='float32': # need to add more data types, e.g. integers
obj_json_compliant = np.where(np.isnan(obj)+np.isinf(obj) , None, obj) # Remove nan's and +/- infinity
return obj_json_compliant.tolist()
else:
return obj.tolist()
else:
return obj.tolist()
else:
raise NotEncodable
(you have to be a bit careful with the numpy
types as text is allowed to have nan and inf's).
Then you would simply do a single return (no re-loading JSON and re-exporting):
encoded_o = super(PlotlyJSONEncoder, self).encode(o)
return encoded_o
Would it be possible to implement my solution?
Hi - we are trying to tidy up the stale issues and PRs in Plotly's public repositories so that we can focus on things that are still important to our community. Since this one has been sitting for several years, I'm going to close it; if it is still a concern, please add a comment letting us know what recent version of our software you've checked it with so that I can reopen it and add it to our backlog. Thanks for your help - @gvwilson
I have profiled a simple heatmap here:
profile_code.py
app.py
Result (Python 3.7 Windows):
Result with Phython 2.7 linux are almost identical
Looking through the profiling it looks like the main causes is creating the JSON, with
C:\Python37-32\lib\site-packages\_plotly_utils\utils.py
taking 2.45s out of a total of 4s. I know that orjson (only Python 3) can be faster than Python's default JSON. Would you expect that changing to orjson would improve performance / be possible to implement?Thanks for any insight.