sciapp / sampledb

Sample and Measurement Metadata Database
https://scientific-it-systems.iffgit.fz-juelich.de/SampleDB/
MIT License
21 stars 11 forks source link

Datatype based on the plotly JSON format #13

Closed MayerBjoern closed 3 years ago

MayerBjoern commented 3 years ago

We introduced a new datatype based on the plotly JSON data format. The idea is to add the capability to store plots, their raw data and visual reprensentation information in the simplest way possible, without having to introduce and manage an own unique system to store different types of raw data and definitions for a multitude of different plot types.

The plotly_chart datatype simply expects the plotly JSON chart schema (https://plotly.com/chart-studio-help/json-chart-schema/#:~:text=The%20Plotly%20JSON%20visualization%20schema,platform%20that%20understands%20the%20schema.) as its data. The JSON structuce is externally verified via the python plotly module.

The demo dataset contains two Samples that show the new datatype in action.

On top of the plotly related changes we also made two general code changes in the process.

  1. We use demjson to decode the JSON plotly data into a python dict. The reason for this is to not loose any floating point accuracy, since demjson creates decimal objects in python (https://docs.python.org/3/library/decimal.html). For this to work, we also change the SQLAlchemy json_serializer in the init.py, since the default json serializer can't handle decimal objects. The sample data in scripts/demo_data/objects/plotly_example_data2.sampledb.json contains an example that needs the decimal object conversion to not loose floating point accuracy. Maybe in the future we can also use this to set up the pint conversions between regular magnitude and base_units to prevent accuracy loss.

  2. We moved the inclusion of jquery to the head of the HTML, so we have it available earlier then the general script_block at the very end of the page. It is just a little bit more convienient, since I atleast don't know how to modify jinja blocks from within an included template within a child template like the objects/view and objects/forms templates.

FlorianRhiem commented 3 years ago

Hey @MayerBjoern and Rene,

thank you for the pull request! While plots can already be uploaded as images, this combination of the data behind the plot and the plot itself can surely come in useful, especially when the data is inserted via the API. However I don't completely understand why preventing manual input of the data would necessary, though. Could you explain why the plotly_chart field needs to be readonly? If there's a good reason for that, maybe a tooltip could explain this when hovering over the textarea?

Also, it looks as if a user could still enter data by altering the textarea using an inspector tool, and the form data would then be interpreted. Is that intentional?

I've tried the PDF export and first got an error that the psutil package is required, after installing that I got another error that the orca executable is required as well. Going by the plotly documentation it seems that the export should instead be done by Kaleido and after installing Kaleido via pip as well, the PDF export worked :) Can you add the psutil and Kaleido packages and their dependencies to requirements.txt?

As the plot filter / plotly_base64_image_from_json function is only used for PDF export, it should export a vector format instead of a raster format. Here is a side-by-side comparison of SVG export (left) and PNG export (right) that shows the much clearer text:

image

All needed for that is to pass the svg format and use the image/svg+xml mime type:

fig_plot.write_image(image_stream, "svg")
image_stream.seek(0)
return 'data:image/svg+xml;base64,{}'.format(base64.b64encode(image_stream.read()).decode('utf-8'))

Going through the code, the plotly data is almost always used in the encoded form. If I didn't miss anything, the only point where it actually needs to be decoded is during validation, when you check that it is a dictionary. If we treat the plotly data as a black-box text format everywhere else, we could avoid a lot of back and forth between decoding and encoding, and it also wouldn't be necessary to change the SQLAlchemy encoder. If we still decode it during validation, the dict-check could still be done along with possible future restrictions (e.g. if a certain plot type were to be required by the schema). What's your opinion on that, do you see any further pros/cons?

MayerBjoern commented 3 years ago

Hi @FlorianRhiem,

First of all i changed pdf_export to SVG, that's obviously just better. I also added the two packages to the requirements.txt.

Regarding demjson, i removed all depencies on it and the SQLAlchemy Encoder change. Since plotly accepts raw data to be given as either number or string values off-loading the responsibility to ensure floating point accuracy to the plotly data creator by making those values strings, is fine by me. I understand the reluctance for such a widely affecting change, but i just thought it would potentially be a nice addition to internally handle quantity object values as decimal data-types aswell in the future. AFAIK simplejson would work aswell. The plotly data is still a dict object. I just think that if you think about the object data as a whole, it makes more sense to have a complete browsable JSON structure instead of a JSON object with a string value of even more JSON at some point as a value. We don't really use the opportunity to access the plotly data much yet, but we use it in the table and list views to get the chart title as a short text representation without having to display the whole plot immediately.

Preventing manual input is obviously not necessary, the only purpose of the readonly attribute was to prevent accidental changes or deletes in the input field, which is also why we did not even try to prevent tampering with it. The idea was just that the regular user would usually just edit fields surronding that data and not really have to or considering the complexity of the structure even want to edit the plotly JSON itself. I made it editable now and the json inside it readable with indent, but i kept the textarea quite small, you can resize it anyway if need be.

MayerBjoern commented 3 years ago

Just to add more context, obviously you could just upload a plot as an image and create an array for the raw data to reach a somewhat similar representation. The potential downsides with this is that this could heavily bloat the edit form of an object. Besides that the plotly plot obviously offers some really nice zooming and data pointing features.

We want to use this simply as a quick visual representation of the actually uploaded data. For example our end-user will upload a proprietary file of a HPLC measurement through a small client tool we made. This tool will open the file, extract the raw data, create a new corresponding object, pre-fill some object variables based on the contents and name of our file, create the plotly json and then additionally upload the original file to the object. This way you have all the surronding meta informations for a measurement and a quick visual representation of the data without having to download the file again and opening it in a third-party tool. Additionally you potentially also have the entire raw data available (although not trivially accessible inside the plotly object) in case you would for whatever reason someday loose the ability to open the specific attached third-party files.

FlorianRhiem commented 3 years ago

Thank you, @MayerBjoern and Rene!

Together with @danielkaiser I've added some more minor fixes and improvements, and it's now merged into develop, so it'll be part of the 0.19 release.