Open andrea103 opened 6 years ago
This is an interesting and deceptive problem. We should do more that currently implemented (leave it to the user to figure out), and we should discuss what is most appropriate.
Here is the problem statement:
Read runs off of 2 main sequence hierarchies, TextPhysical( a sequence of PhysicalLines (a sequence of syllables/aksara)) and Text(a sequence of TextDivisions(an arbitrary sequence of words)) both of which are system defined semantics and identify the physical layout of the text and the ordered set of words in the text.
Structural Analysis is a system of semantically tagged sequences of heterogeneous entities and has an open taxonomy with a looser semantic making it harder to know when it is appropriate to modify.
Read currently has an ownership edit model. You have to be owner at the edition level to be able to modify any part (tokens, structure, linked segments, etc.) of the edition. The problem exist when you own an edition that uses (tokens by another owner, as in a derivative work) where you are allowed to edit via "clone and edit". With the system defined semantics of the 2 main sequences, the system is able to propagate changes up this "containment" hierarchy with correctness using a "find id and replace" method. While it is possible to find occurrences of word in other sequences, its not clear when to replace and when to not replace. We could define some sort of code or relational hierarchy for the semantic term that signifies "OK to replace on change" or "OK to replace on change if owned" or "clone and replace on change".
Stephen A White Digital Humanities Software Engineer/Consultant email: stephenawhite57@gmail.com ph: +39 389 093 2269
On Tue, Mar 6, 2018 at 8:02 AM, andrea103 notifications@github.com wrote:
Assigned #895 https://github.com/stevewh/READ/issues/895 to @stevewh https://github.com/stevewh.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/stevewh/READ/issues/895#event-1505815648, or mute the thread https://github.com/notifications/unsubscribe-auth/ACOHIW2JP8HX3BzzjIyOOHGwgs18Kh7bks5tbjSBgaJpZM4SeM9c .
The same problem extends to the glossary when dealing with compound splitting. This is a higher priority with respect to the glossary.
Regarding the glossary I cannot edit certain lemmas any more. So I change the priority again...
Might be best to consider opening a new issue with regard to the Glossary or modifying the priority of #897. We also experienced an issue with not being able to edit certain lemmas which was related to an as yet unexplained ownership issue. So far I have no way of reproducing it so had reduced its prioroty.
Could we agree on "OK to replace on change if owned"?
The approach of OK to replace on change if owned would seem to be a low risk way to move the issue forward. Am assuming Steve would traverse the hierarchy of sequences from Analysis down replacing in any sequences owned by the user. It would seem address the large majority of cases and leave the others no worse off.
As discussed with Andrea and Steve: We do not currently have a use case for discontinuous sequences, but may in the future. Establish a typology of sequences: (1) continuous, (2) discontinuous. For (1), when a change is made in the middle of sequences, the result automatically still belongs to that sequence. For (2), take the changed parts out of the sequence and let the user make manual adjustments. Since we currently only have sequences of type (1), make keeping changes within a sequence the default for now.
There is likely to be a boundary case where two words span a structure W1 at the end of S1 and W2 at the start of S2. We currently extend W1 to include W2 and remove W2 from S2.
Stephen A White Digital Humanities Software Engineer/Consultant email: stephenawhite57@gmail.com ph: +39 389 093 2269
On Tue, May 22, 2018 at 5:12 PM, baums notifications@github.com wrote:
As discussed with Andrea and Steve: We do not currently have a use case for discontinuous sequences, but may in the future. Establish a typology of sequences: (1) continuous, (2) discontinuous. For (1), when a change is made in the middle of sequences, the result automatically still belongs to that sequence. For (2), take the changed parts out of the sequence and let the user make manual adjustments. Since we currently only have sequences of type (1), make keeping changes within a sequence the default for now.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stevewh/READ/issues/895#issuecomment-391028191, or mute the thread https://github.com/notifications/unsubscribe-auth/ACOHISlWVwcN37egOws5U7jzMeAXxyyEks5t1Cr1gaJpZM4SeM9c .
We have made quite some use of discontinuous sequences on 2 of our projects. In all cases the sequences have homogenous ownership/visibility. Don't have of a use case on the horizon that would have mixed ownership/visibility or one that would extend beyond an edition. I probably need to catch up with Steve as to if/where an automated replace might fail in this case.
Split a token sees fine as we would add the token to whichever sequences the split token was from. Delete a token is fine. Figure the problem is where I add a token. If we were to adopt the rule that an added token is added to sequences that contain the preceding token then it becomes buyer beware as far as discontinuous sequences are concerned,
summary (?)
Have added Prakaś to this one as I anticipate that we might want to finesse the spec to encompass discontinuous sequences.
Would might need to consider special case discontinuous sequences (classing the semantic as "discontinuous") and handle them differently. Do you consider them to always be sequential or are some just represented as a group of tokens (unordered with individual roles)?
On Fri, Sep 21, 2018 at 11:36 PM IanMcCrabb notifications@github.com wrote:
Have added Prakaś to this one as I anticipate that we might want to finesse the spec to encompass discontinuous sequences.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stevewh/READ/issues/895#issuecomment-423677354, or mute the thread https://github.com/notifications/unsubscribe-auth/ACOHIY3O2vGretZnKaEwyI4mzsCjsOn-ks5udVvMgaJpZM4SeM9c .
Will unpack some representative examples when this get to be a Pri 1
To be clear, the solution being considered is to replace tokID or cmpID (top level) in seq entities that are editable by the current "edit as" automatically, similar to how it currently works for TextDivision seq types. This is done under the scope of the current edition.
This solution should handle the cases that are primary work flows. Given the case where a user creates a clone, when they make edits to tokens (including syllables) the physical line and text division sequences get cloned and token changes are recorded in the cloned sequences. In this scenario, the analysis sequences are linked to the newly cloned edition but are not cloned and therefore not editable. This means that changing tokens that are also included in any non editable sequence will not be changed. This means that the new cloned edition's "edit as" user will need to create a parallel analysis ( currently by hand) to have token changes automatically work.
Understood and agreed.
Hi Andrea, after discussions with Steve and Andrew we formed the view that perhaps we might move this from Pri2 and Sev2 to Pri1 and Sev3 as this both requires some more detailed specification (if it is to be more than a partial solution) and that there is a workaround albeit bothersome. If you are agreeable we might handle this instead later in November after our internal V1 release.
Ok
This seems to be a major issue in managing a corpus. Perhaps we need to raise the priority of this back up.
Problem: When a token is modified, it is no longer part of a sequence but has to be added to it again. It should remain in the sequence as before.
summary (?)