ispras / lingvodoc-react

Apache License 2.0
7 stars 11 forks source link

More robust parser result handling #1120

Open myrix opened 1 month ago

myrix commented 1 month ago

Current implementation of parser result processing is problematic.

Parser results with disambiguation info are stored as plain text html, see DB table parserresult attribute content, are displayed in the interface as is, https://github.com/ispras/lingvodoc-react/blob/39b00004b5f94014ad0de095fbfd258fcc64bafa/src/components/OdtMarkupModal/index.js#L505 and are modified by directly taking and saving interface HTML source as is, https://github.com/ispras/lingvodoc-react/blob/39b00004b5f94014ad0de095fbfd258fcc64bafa/src/components/OdtMarkupModal/index.js#L396

This is obviously unsafe and leads to problems when there are unintended interface HTML source modifications, e.g. when the interface page is modified by translation extensions or built-in translation browser functionality, messing up parser result HTML markup structure.

We need to fix this by properly storing parser result data in explicit internal representation format, e.g. as JSON, both on the backend and the frontend, so that interface would explicitly display, modify and save this representation ensuring its integrity.

Naturally, all functionality which uses parser results as source data, in particular valency example extraction, should be suitably updated. Also, it might be beneficial to store parser results not as whole big JSON documents, but separately by paragraphs or even paragraphs and sentences to simplify processing and editing, in particular allowing to minimize data exchange between frontend and backend when saving disambiguation updates, though that will require more extensive modifications to parserresult DB table (and perhaps intoduction of additional helper tables) and source code of corresponding functionality and should be carefully considered before deciding whether to go for it or not.

It may very well be possible that to a certain extent work on this issue would be better done concurrently with other current issues pertaining to handling of parser results and their derivatives.

vmonakhov commented 1 week ago

The issue is mostly resolved. Main points: