saros-project / saros

Open Source IDE plugin for distributed collaborative software development
https://www.saros-project.org
GNU General Public License v2.0
160 stars 52 forks source link

[VSC] Initialize vscode plugin #1111

Closed marvis97 closed 3 years ago

marvis97 commented 3 years ago

In this PR I want to move the first basic files from Michaels saros-vscode prototype into the saros project. In the next days, I will create more and more PRs with little parts of Michaels code. So I can understand it and it is better to review for you. I am happy about any criticism or tips from you.

Thanks

Marvin

srossbach commented 3 years ago

I want a "little" bit more detail on how things work in VSCode before approving anything related to the work of Micheal.

My basic knowledge is: On the client side the Jupiter transformation (the messages received from the server) and applying the transformation have to be atomic, i.e if Jupiter calculate insert A to document Foo in position X then there must not any event that inserts any other character into this position while the calculation and the execution of the result is in progress. Am I right @tobous ?

To complicate things a little bit more VSCode works a little bit different. You do not have access to the IDE (hence you even cannot as this is not a Java application). All interaction is done through the LSP protocol.

So my questions:

  1. If I type something in the editor, does the VSCode logic inserts the character in the editor or does it first sends it to the "server" and only updates the content of the editor based on what the server is sending ?

  2. There is a server and a client part in the Saros VSCode plugin. Which part is responsible for what ?

  3. In which context is the Jupiter transformation applied ? Assuming the VSCode editor only updates its contents by what the server is telling it, it seems that the transformation have to applied in the server context.

However by looking at the current source of Micheals work I was not able to spot the code parts that are responsible for applying these transformations while any pending LSP messages are blocked.

tobous commented 3 years ago

I want a "little" bit more detail on how things work in VSCode before approving anything related to the work of Micheal.

My basic knowledge is: On the client side the Jupiter transformation (the messages received from the server) and applying the transformation have to be atomic, i.e if Jupiter calculate insert A to document Foo in position X then there must not any event that inserts any other character into this position while the calculation and the execution of the result is in progress. Am I right @tobous ?

In short: Yes, not transforming and applying the received Jupiter activity in an atomic step will lead to issues.

The Jupiter activity has to be transformed with and applied to the same state. If there are any user (or automatic) changes to the editor content after the activity was transformed but before it has been applied, the activity will not apply cleanly. This will always lead to a desynchronization in cases where the local change interacts with the received change (because it is located at a position before or overlapping with the received change). And even in cases where the local change is located after the received changed, meaning it won't interfere with the received change, processing them concurrently might still lead to issues when trying to apply the local activity to the other participants due to issues with the vector time (e.g. if the local activity was applied before the remote activity but was processed by Saros after the remote activity had already been applied, leading to it having the wrong vector time; as a result, it would not be transformed correctly on when applied for other clients).

During the discussions with Michael as part of the standup, he claimed to have resolved this issue (or we misunderstood each other on which issue we were each talking about). But, as he never went into detail on how he did it and has not provided his code yet, I don't know what his solution approach is and if it actually works.

From looking at the LSP interfaces and the current stub implementation, the server method provided to react to document changes has neither a return value nor a callback function. Additionally, the documentation for the didChange(...) method also does not mention anything about the client waiting for a server acknowledgement/response.

It seems like the protocol is not designed for direct/blocking reactions to code changes. Other listener methods that are explicitly meant to produce text changes, like formatting requests, return text edits as a CompletableFuture object. This probably ensures that they are applied before any other user actions can be made (or are thrown out if conflicts occur). Otherwise, they would face the same issues.

To complicate things a little bit more VSCode works a little bit different. You do not have access to the IDE (hence you even cannot as this is not a Java application). All interaction is done through the LSP protocol.

So my questions:

@marvis97 My answers here are largely based on assumptions as I was not able to find concrete details in the documentation. As I only have a somewhat limited understanding of the LSP, could you correct me if my assumptions are wrong and add any additional information that is missing or might be helpful?

  1. If I type something in the editor, does the VSCode logic inserts the character in the editor or does it first sends it to the "server" and only updates the content of the editor based on what the server is sending ?

The LSP only offers the method didChange(...), which has the following documentation: "The document change notification is sent from the client to the server to signal changes to a text document." So I would guess that the changes are directly applied in the client and the server is notified afterwards.

  1. There is a server and a client part in the Saros VSCode plugin. Which part is responsible for what ?

From the documentation, the "client" part is the logic that integrates with the IDE, i.e. the part that provides access to the different IDE functionalities and sends notifications for changes or requests in the IDE. The "server" part is the logic that provides additional functionality. This is the part that Saros would run in. The server part is notified by the client part for events in the IDE.

  1. In which context is the Jupiter transformation applied ? Assuming the VSCode editor only updates its contents by what the server is telling it, it seems that the transformation have to applied in the server context.

(The editor is also updated through user actions or automatic IDE actions.) Following the LSP model, the Jupiter activities would have to be processed in the server component. The actual document changes would then have to be sent to the client to apply them. This brings two issues with it:

  1. The server needs access to the current document state to process the Jupiter activities. I am not sure whether the LSP provides a method for the server to request a specific document state from the client (as it is usually meant for reactions to document changes from my understanding). Additionally, these changes to process might regard documents that aren't even open in a local editor. I am not sure whether this is supported by the LSP as it explicitly states that a client can only modify documents that were "claimed" (i.e. opened) locally through the didOpen protocol beforehand. (But maybe this is only a restriction guaranteeing that a didOpen message is always sent before any didChange messages can be sent.)
  2. The server needs a way to send these processed Jupiter activities back to the client so that they can be applied to the local state. I am not sure whether such a server-initiated text change is supported by the client.

The way I see it, we would have to extend the client implementation for all of this to work. Specifically, we would need to add

But this would also mean the we could not rely on the basic client implementations for different IDEs (which is how the LSP is normally used in my understanding) but would have to implement this extended client for every IDE we want to support. @marvis97 Or is there a way of dynamically extending the client interfaces/implementation?

With the server and the client being separate components that communicate through the (local) network, we also face an additional synchronization issue. The client and the server could produce changes at the same time, the server changes caused through received Jupiter activities and the client changes caused through local changes. We would have to make sure that these simultaneous changes are processed correctly and don't lead to any race conditions. Additionally, this split into client and server makes it a lot harder to avoid the loop-back issues that we also had with the other implementations. How do we ensure that the changes caused by applying Jupiter activities processed by the server are not reported back as local changes by the client? Guessing based on the changed content (i.e. holding a list of processed changes and ignoring a client change if it matches an entry in the list) is dangerous as it might lead to dropping valid client activities in some corner cases. Additionally, properly maintaining this list of changes to only drop the loop-back and not valid subsequent activities is a hassle. My guess would be that we don't have access to the underlying client listener infrastructure (or we would have to adjust the listener accordingly) to disable the listener in such cases, so this also would not be an option. Additionally, this approach isn't optimal as well because the LSP client can be shared by multiple servers, meaning disabling the listener for all of them will interfere with the other servers functioning correctly.

However by looking at the current source of Micheals work I was not able to spot the code parts that are responsible for applying these transformations while any pending LSP messages are blocked.

You can find some of Michael's changes when looking at other branches on his fork. I think the newest one that contains additional logic is feature/refactor. But a lot of things were changed and the history is kind of messy, so it is hard to figure out what exactly was changed and why.

marvis97 commented 3 years ago

@tobous I will answere your questions but I need a little bit time to research it

srossbach commented 3 years ago

I already mentioned in another pull request that https://github.com/mschaefer88/saros-vs-code/blob/feature/refactor/lsp/src/saros/lsp/ui/UISynchronizerImpl.java is wrongly implemented.

There are multiple ways of locking the LSP4J message consumer but I guess we do not even have to lock it on that stage. We can apply our "lock" in this class https://github.com/mschaefer88/saros-vs-code/blob/feature/refactor/lsp/src/saros/lsp/extensions/server/DocumentServiceImpl.java

However we still need to know when text edits in VSCode are actually applied. Should not be that hard to figure out, just insert Thread.sleep in the DocumentServiceImpl and look at the results.

srossbach commented 3 years ago

Just wondering if https://visualstudio.microsoft.com/services/live-share/ is build as an LSP server or just as in extension/plugin for the specific IDE ? Anyone knows any details ?

I just looked at the LSP protocol and at least for me it seems there is no way to apply proper atomic transformation and application, i.e even if you tell the client to apply edit X to file Y the client can still refuse to apply the result. You can also refuse to apply a transformation from the client as a server as well. However the client has the "original" copy, every time, so bad luck in this case, basically the server is now out of sync.

I am still wondering how a server should process some refactor stuff when it is the duty of the client to sync the initial files of the local workspace with the server.

tobous commented 3 years ago

Just wondering if https://visualstudio.microsoft.com/services/live-share/ is build as an LSP server or just as in extension/plugin for the specific IDE ? Anyone knows any details ?

They aren't very specific and the sources don't seem to be public "yet" (in 2018 they said it was still "too volatile" to publish and they haven't said anything since as far as I can tell).

But they called it an "VSCode extension", so I would guess that it is directly integrated into the IDE and does not use the LSP.

I just looked at the LSP protocol and at least for me it seems there is no way to apply proper atomic transformation and application, i.e even if you tell the client to apply edit X to file Y the client can still refuse to apply the result. You can also refuse to apply a transformation from the client as a server as well. However the client has the "original" copy, every time, so bad luck in this case, basically the server is now out of sync.

I am still wondering how a server should process some refactor stuff when it is the duty of the client to sync the initial files of the local workspace with the server.

Well, for refactoring/formatting calls, the LSP has separate listener methods that allow the server to return a list of text edits to apply on the client side. Maybe the protocol waits for an answer from the server in such cases. However, I am not sure how this is supposed to work correctly either as one client could theoretically have multiple servers that listen for refactor/format requests.

srossbach commented 3 years ago

Well, for refactoring/formatting calls, the LSP has separate listener methods that allow the server to return a list of text edits to apply on the client side. Maybe the protocol waits for an answer from the server in such cases. However, I am not sure how this is supposed to work correctly either as one client could theoretically have multiple servers that listen for refactor/format requests.

What I meant is: your workspace exists of X files. You do a didOpen call (or whatever this RPC is called). Now how do the server knows if you call "show me all references/implementations" in which files it actually have to perform the lookup ? Of course it could make a private copy but what is validating the files, i.e if they are still in sync when performing such queries ?

To be more specific:

In a worksession I open file X and Y. I alter them both. Now in another worksession I only open file X. The protocol does not specify that the LSP server has to be aware / keep a copy of the client workspace.

TLDR: Microsoft stuff as its best. I worked for more than a year with several MS APIs. Lessons learned: information is spread all around the web, documentation is often ambiguous, poor, or not existing at all.

Drakulix commented 3 years ago

@tobous After looking at the protocol documentation it seems that the LSP can freely send edits to the IDE: https://microsoft.github.io/language-server-protocol/specification#workspace_applyEdit

Different to the "didChange" notification, which contains a "versioned file identifier" the edit is not applied against a specific version of a document, but always against the "current" state (whatever that may be). But at least "applyEdit" returns whenever the edit was successful or not.

So I guess a rather hacky way of doing saros-via-lsp would be to just assume we are fast enough most of the time, that the asynchronous nature of the protocol will not be an issue. Upon a "didChange" notification send those edits to the session host and when you receive jupiterActivities (that are not your own and therefor already done by the IDE) apply them via an "applyEdit" event. When we receive a failure we can assume any other plugin or network delay did interfere and immediately flag the file as out-of-sync in saros (and trigger a full resync).

Even with the vscode extension api there does not seem to be a proper way to block on client edits. In fact the api looks very similar to the LSP protocol: https://code.visualstudio.com/api/references/vscode-api#TextDocumentChangeEvent

@srossbach I would assume an implementation like this could work good enough in most cases, but is definitely a hack. Especially when editing the same position in a file this would likely cause some erratic behavior in VSC. Would you even consider merging something similar?

srossbach commented 3 years ago

Why should I design and build a car when I know that the engine of that car will not/never work as expected ? If the validation of your product failed then it failed, period. If you take a look at the history of Saros/E then there was a reason Jupiter got introduced.

Regarding your current "suggestions", then we can simply throw Jupiter out again and just simply apply the the TextEdits and really hope that latency is "not our duty".

Drakulix commented 3 years ago

Why should I design and build a car when I know that the engine of that car will not/never work as expected ? If the validation of your product failed then it failed, period. If you take a look at the history of Saros/E then there was a reason Jupiter got introduced.

Regarding your current "suggestions", then we can simply throw Jupiter out again and just simply apply the the TextEdits and really hope that latency is "not our duty".

Because this in turn means, that we cannot support any IDE, that does not provide a synchronous text-edit api. And those are getting more an more uncommon as most plugins do not want to block the user and an asynchronous architecture helps with that.

Do not get me wrong, if we want to support those IDEs and Jupiter is just not a good fit, I am open to discuss replacing it with something that is more suitable in the long run. I am just asking, if we cannot find a reasonable solution for short/mid-term with the current architecture.

I mean it is not like I want to ignore Jupiter, my proposed suggestion would still use it (even if only to detect desynchronization as opposed to resolve conflicts).

marvis97 commented 3 years ago

@tobous Sorry for my very late response, but I worked myself into the subject and can hopefully answer your questions. In this PR there are many topics and before I'll start I want to explain some basic architectures and design decisions of Michael, they might be helpful. As I think you told later in this PR, VSCode doesn´t allow any direct access to the UI or anything else. The extensions are running in a seperated process and have, with one exception, no contact to the VSCode client - only if they use the VSCode-Extensions API. The API allows registering commands, changing the theming and many other things - but nothing that influences the performance (like blocking something). Also it seems like, that VSCode and specially the extensions api work completly asynchronous - including lsp. Michael also implemented the Saros - and the VSCode part in two different ways, because he didn't find a good way to use the Java-Saros code in VSCode-TS (according to his own statement). Now the questions (which are not answered by anyone else):

The LSP only offers the method didChange(...), which has the following documentation: "The document change notification is sent from the client to the server to signal changes to a text document." So I would guess that the changes are directly applied in the client and the server is notified afterwards.

That is right. VSCode applies the change and notifies saros via lsp about the change. Also your next assumptions about the client / server architecture are generally right. In detail Michaels Saros-VSCode extension has 4 different areas. In this architecture vscode is the client and consists of the UI implementation (e.g. register different commands and buttons in vscode and the functions to call or apply a textEdit or colouring) and the vscode-lsp part, which implements the lsp communication protocol where vscode e.g. can send his textEdit to saros. The other part is the "standalone" saros implementation in java (based on the saros server) which communicates with vscode about his own java-lsp implementation and is running in a child-process of vscode.

a way to inform the server of document changes while blocking the client until the server has responded,

That is the big problem of vscode, because this will not work (the extensions api has no route to do this)

a way for the server to push changes to the client without a prior request.

I´m not sure why - can u explain it maybe?

But this would also mean the we could not rely on the basic client implementations for different IDEs (which is how the LSP is normally used in my understanding) but would have to implement this extended client for every IDE we want to support. @marvis97 Or is there a way of dynamically extending the client interfaces/implementation?

The basic idea of lsp is, that every programming language is implementing a general lsp-connection and every IDE too (https://code.visualstudio.com/api/language-extensions/language-server-extension-guide). Even if there is any way to rely on the basic implementation, saros has more functions than lsp can handle. What does this mean for implementing saros-lsp in a new IDE (that is how Michael worked)? You search for a default implementation of the IDE/language (in Michaels case, he uses the default Microsoft implementation for vscode and the default Eclipse-java implementation for the server part) and upgradet the default-implementation with the missing saros-features (see Michaels BA-Thesis).

You can find some of Michael's changes when looking at other branches on his fork. I think the newest one that contains additional logic is feature/refactor. But a lot of things were changed and the history is kind of messy, so it is hard to figure out what exactly was changed and why.

As Michael explained his project to me, he told me that the pr/vscode/preview branch is the best one to try his extension. All my tests based on this branch, but I will also test it with the /feature/refactor (which is one month newer).

At this moment, there is in my opinion no way to implement a synchronous vscode-saros extension. But: Victors idea might be a good way to bypass the problem:

So I guess a rather hacky way of doing saros-via-lsp would be to just assume we are fast enough most of the time, that the asynchronous nature of the protocol will not be an issue. Upon a "didChange" notification send those edits to the session host and when you receive jupiterActivities (that are not your own and therefor already done by the IDE) apply them via an "applyEdit" event. When we receive a failure we can assume any other plugin or network delay did interfere and immediately flag the file as out-of-sync in saros (and trigger a full resync).

The big question is: Do we want to support an IDE, which doesn´t support the synchronous saros architecture? I think we should discuss this here and in parallel I can implement Victors way so that we can see, how good this works. That can take a while because the /pr/vscode/preview branch wasn´t stable (in special while changing the same word with different user at the same time) and I have to fix some bugs before I can implement it, but if this works out we might have a good vscode IDE-support.

Now I'm worked into the subject so I'm able to answere you much faster than before:)

tobous commented 3 years ago

a way for the server to push changes to the client without a prior request.

I´m not sure why - can u explain it maybe?

There are two basic cases in which Saros has to act:

  1. When the state of the local IDE changes. This case has already been discussed and is (at least somewhat) covered by the default LSP API: Local changes in the IDE are reported to the Saros LSP server, which can then send them to the other session participants.
  2. When it receives a remote change that is supposed to be applied to the local IDE. In this case, the Saros LSP server also has to react to external network events that are independent of the local IDE and are therefore not directly triggered by an IDE event. This creates the need for the Saros LSP server to be able to push changes to the client/IDE without a prior request.

My confusion stemmed from by base assumption that the LSP only accomodates client->server requests. But, as @Drakulix pointed out, this is not the case:

After looking at the protocol documentation it seems that the LSP can freely send edits to the IDE: https://microsoft.github.io/language-server-protocol/specification#workspace_applyEdit

This is implemented though LanugageClient.applyEdit().

But this would also mean the we could not rely on the basic client implementations for different IDEs (which is how the LSP is normally used in my understanding) but would have to implement this extended client for every IDE we want to support. @marvis97 Or is there a way of dynamically extending the client interfaces/implementation?

The basic idea of lsp is, that every programming language is implementing a general lsp-connection and every IDE too (https://code.visualstudio.com/api/language-extensions/language-server-extension-guide). Even if there is any way to rely on the basic implementation, saros has more functions than lsp can handle. What does this mean for implementing saros-lsp in a new IDE (that is how Michael worked)? You search for a default implementation of the IDE/language (in Michaels case, he uses the default Microsoft implementation for vscode and the default Eclipse-java implementation for the server part) and upgradet the default-implementation with the missing saros-features (see Michaels BA-Thesis).

Ok, this question was due to a general misunderstanding of the LSP architecture on my part. I thought that the LSP runs using a general IDE LSP client that is shared by all LSP servers. But instead, each Server provides its own client. That makes a lot more sense in this context, as modifications of the client implementation could otherwise clash between LSP server applications, making them incompatible.

So I guess a rather hacky way of doing saros-via-lsp would be to just assume we are fast enough most of the time, that the asynchronous nature of the protocol will not be an issue. Upon a "didChange" notification send those edits to the session host and when you receive jupiterActivities (that are not your own and therefor already done by the IDE) apply them via an "applyEdit" event. When we receive a failure we can assume any other plugin or network delay did interfere and immediately flag the file as out-of-sync in saros (and trigger a full resync).

@Drakulix @marvis97 This proposition has to be considered carefully as it can lead to cases that are not easily fixable/recoverable. I am not sure whether it offers an acceptable usability.

In general, when triggering a re-sync due to inconsistencies, this does not guarantee that the recovered state is actually correct. It just guarantees that the state matches for all participants. Additionally, the proposed approach gets much more problematic when the VSCode user is the host of the session. As the host determines the valid state, such resolving such issues through the recovery would force all other session participants to trigger a re-sync (instead of just re-syncing the VSCode user). And, as an added complications, the recovery can even drop recent changes completely in some corner cases, causing recent work to be lost (e.g. if the VSCode user is the host and refuses a remote edit, the re-sync will also drop the refused edit for all other participants).

Furthermore, without the support for atomic/blocking Jupiter transformations, it is not guaranteed that the intention of the local activity created by the VSCode user is preserved. So even if the inconsistency is resolved through a re-sync, it is not guaranteed that the resulting state is "correct"/matches what the VSCode user wanted to do. This would get very annoying as users would have to manually fix these edits which were shifted to the wrong place.

As an example:

Lets assume we are looking at a two-participant session between Alice (host) and Bob. Alice use Saros for Eclipse and Bob uses Saros for VSCode. They are working on a single file that only contains the line Hello world at the start of the session for both participants.

  1. Alice starts by changing the line by adding 'my unique' at position 5, resulting in the line Hello my unique world for Alice. This change is then wrapped in a Jupiter activity and sent to Bob.

  2. Bob, still working on the base content Hello world, makes his own changes. He adds '!' at position 11, resulting in the line Hello world!. This change is then sent to the LSP Server.

  3. Before the change from Bob is received or processed by the LSP server, Alice's change is received and processed. This results in the LSP server (i.e. Saros) now thinking that Bob also has the local state Hello my unique world. The change is then also sent to the LSP client and processed, leading to the actual state Hello my unique world! for Bob.

  4. The LSP server receives Bob's change. As it thinks that Bob has already processed Alice's change, it sees no need to transform it and sends it to Alice unchanged. As a result, Alice receives the activity to insert '!' at position 11, resulting in the line Hello my un!ique world.

For both participants, all activities have been processed. So according to the properties guaranteed by the Jupiter algorithm, the content for Alice and Bob should match. But, as shown above, it does not (Alice having the content Hello my un!ique world and Bob having the content Hello my unique world!). This will be detected by the consistency watchdog, prompting Bob (as the non-host) to perform a re-sync/recovery. This re-sync then forces the corrupted state Hello my un!ique world onto Bob, leading to both participants having a local state that does not match the intention of Bob's edit (which was supposed to add '!' to the end of the line). So even after the recovery, the resulting content is still not as expected and needs to be fixed through user intervention.

While you could try to resolve this mismatch in content in step 3 by querying the actual editor state when processing incoming activities, the example can easily be adjusted so that Bob does his edit directly after the query but before processing the LSP server message containing Alice's change, resulting in the same issues.

This problem only gets worse with more complex or longer edits, as it makes the expected/correct result harder to determine and further separates the occurrence of the incorrect edit from the actual position at which the edit was intended to be made.

In general, this seems to be the exact same problem that is supposed to be resolved by the Jupiter algorithm in the first place. Introducing another asynchronous exchange transferring text edits creates the need for an additional protocol to handle conflicting changes correctly (i.e. introducing an additional version of the Jupiter algorithm or similar for the LSP client/server communication). This would make the LSP server/client logic much more complicated. Furthermore, if we again can't guarantee that events are processed atomically/in a blocking fashion by the LSP client, this would not help much either as similar problems can be caused by incorrect Jupiter transformations at the LSP client side (i.e. if a local edit is made while a change received from the LSP server is being transformed).

Drakulix commented 3 years ago

I do not want to invalidate your claims, as most of them still hold up, but there is one mistake in your example.

  1. The LSP server receives Bob's change. As it thinks that Bob has already processed Alice's change, it sees no need to transform it and sends it to Alice unchanged. As a result, Alice receives the activity to insert '!' at position 11, resulting in the line Hello my un!ique world.

Not necessarily. While applyEdit is always executed against the current editor state, the didChange notifications to the lsp-server are versioned! Which means saros might be able to determine, that Bob's Change was supposed to be processed earlier.

I am not sure, if I understand the Jupiter-Algorithm completely, but I think it should be able to resolve this conflict. My gut-feeling is, that vscode-saros being the host is actually the better case, because of this. Forcing the vscode state onto everybody, but with fewer recovery-steps. But I have no validated any of those claims and I also think that this is not feasible for more complex edits.

I guess the only way to check, if this is feasible at all, would be to build a prototype of the vscode plugin, that is good enough to test the experience and how easy it is to break things, but sadly I expect it to fall apart.

tobous commented 3 years ago

I do not want to invalidate your claims, as most of them still hold up, but there is one mistake in your example.

  1. The LSP server receives Bob's change. As it thinks that Bob has already processed Alice's change, it sees no need to transform it and sends it to Alice unchanged. As a result, Alice receives the activity to insert '!' at position 11, resulting in the line Hello my un!ique world.

Not necessarily. While applyEdit is always executed against the current editor state, the didChange notifications to the lsp-server are versioned! Which means saros might be able to determine, that Bob's Change was supposed to be processed earlier.

I am not sure, if I understand the Jupiter-Algorithm completely, but I think it should be able to resolve this conflict.

@Drakulix I am not sure how the versioning actually works. I skimmed the documentation but could not find any further details. As far as I can tell, the version is just an ID. I could not find a method that allows you to access version-specific content, i.e. to look at previous states of the document. Working under the assumption that it is just an ID for a specific document state (e.g. a hash of the content), it could be useful. We could hold the version ID of the state with which an incoming activity is transformed/applied. This would allow us check how to process local activities, even if multiple incoming Jupiter activities had been processed in the meantime and our changes were applied somewhere in-between.

But it does not help us with more complex setups, i.e. if the local user applies multiple changes which are all only processed after applying remote changes (i.e. if Bob made multiple edits in the above example). To resolve such setups, we would again need to correlate the edits to their timestamp/version and transform them correctly, i.e. apply the Jupiter algorithm at this stage as well.

Additionally, as you pointed out, applyEdit does not have a timestamp. So if we adjust the example so that Bob's state becomes corrupted, we run into the same issue, now without the possibility of using timestamps:

  1. Alice adds '!' at position 11
  2. Bob's LSP Server receives change
  3. Bob's LSP Server processes change
  4. Bob's LSP Server sends change to client
  5. Bob adds 'my unique' at position 5 and sends it to the LSP Server
  6. LSP Client processes received message & adds Alice's '!' at position 5, leading to Hello my un!ique world for Bob (or refuses edit, but as the edit can be applied, why would it?)

We could try to fix this case by then retroactively adjusting Bob's state after his edits were processed by the server, but this would just create even more chances for race conditions with other edits (local or remote).

And even if we could adjust the call to also pass a version to applyEdit in order to avoid the issue, it would still leave us with needing an additional implementation/application of the Jupiter algorithm. Additionally, I don't know whether you can request the content of a file in combination with its version. Otherwise, you could get a race condition where the content has changed between the call to get the content and the call to get the version.

My gut-feeling is, that vscode-saros being the host is actually the better case, because of this. Forcing the vscode state onto everybody, but with fewer recovery-steps.

For this example, it makes no difference who is the host. If you switch the edits (i.e. if Bob adds the 'my unique' and Alice the '!', Alice being the host results in the better outcome. It depends on the specific edits made by the users. And the outcome of the transformations (or at least the consistency of the result) should not be dependent on whether the user is the host or the client.

Additionally, forcing the recovery onto other member (without explicit user approval) is not something that is currently possible in Saros. So every client would have to manually trigger the recovery every time such an issue occurred. And automatically triggering the recovery for other clients without their consent/knowledge isn't a good solution either. Such an action would create a huge opportunity for the loss of content as any changes that are made or that are still in transit after the recovery has been triggered will be dropped by the host. This is due to the recovery resetting the vector time for the recovered client, meaning any activities produced by the client before the reset will no longer be valid.

srossbach commented 3 years ago

If I am not mistaken the LSP-Server is not necessarily the content provider but the LSP-Client manage the actual content.

So performing a recovery is even more complicated as the Saros host should not rely on the document state of its LSP-Server but has to force a resync between its LSP-Client first. This would then require an additional IPC roundtrip where already other changes could be in transmission, e.g the LSP-Server forces a resync with its client but somehow a character insert is currently processed by the LSP server. The general rule for an recovery is that NO Saros user must send any edits during the recovery.

Even in Eclipse it aint that easy to prevent it (we drop the change if it occurs but this can cause other issues in a session with more than 2 users).

marvis97 commented 3 years ago

All of your concerns are absolutely right. I discussed this with Victor, and we agreed, that it need to much time to implement a prototype of the "hacky" way (because I have to fix some bugs before). With all the points we discussed there is actually no way to support Saros with VSCode and my actual theme is done. We'll switch the focus from VSCode to the question, what would be an good architecture for Saros in general to also support asynchronous editors. Thank you for your responses and your expertise. If you want, I can make a PR to remove all VSCode-Stuff from saros/master and you can merge this e.g. after Michaels presentation.

marvis97 commented 3 years ago

I have one more question regarding the functionallity of saros. At this moment, I read the paper about jupiter and understand the architecture in general. Bob is doing a change in his client. The change is applied locally and send to the Jupiter server. In this case, no one else is doing a change and the server doesn´t find a conflict. Now the server has to send the change to the other clients. The question is: does the server send the change to all clients exclusive the source-client? Or does it send to all clients and the source client has to ignore it or something completely different?

tobous commented 3 years ago

I have one more question regarding the functionallity of saros. At this moment, I read the paper about jupiter and understand the architecture in general. Bob is doing a change in his client. The change is applied locally and send to the Jupiter server. In this case, no one else is doing a change and the server doesn´t find a conflict. Now the server has to send the change to the other clients. The question is: does the server send the change to all clients exclusive the source-client? Or does it send to all clients and the source client has to ignore it or something completely different?

Yes, the server filters out the original sender of the change when distributing it to the other session participants.

marvis97 commented 3 years ago

@tobous could you please answere some more questions?

  1. As you said, the server filters out the original sender of the change. Is the server sending an acknowledgement to the original sender if the change is processed (or at any other timepoint after recieving the change)?
  2. How does the jupiter server (and the eclipse and intellij implementation) processes two incoming changes at the same time? Is the server working serial or parallel and if serial, is he blocking something (eg. the file system) to prevent the actual processed change for concurrent processing?
tobous commented 3 years ago

@tobous could you please answere some more questions?

  1. As you said, the server filters out the original sender of the change. Is the server sending an acknowledgement to the original sender if the change is processed (or at any other timepoint after recieving the change)?

No, in general neither the server nor the clients send acknowledgements for received/processed activities.

Deletion activities are the only corner case that actually send acknowledgements. See DeletionAcknowledgmentActivity. This was added as a fix for issues with outdated vector time values for deleted and then re-created files. See PR #714 for more information.

  1. How does the jupiter server (and the eclipse and intellij implementation) processes two incoming changes at the same time? Is the server working serial or parallel and if serial, is he blocking something (eg. the file system) to prevent the actual processed change for concurrent processing?

The Jupiter part of the activity processing is entirely handled by the core. The IDE-specific implementations only handle the logic to actually apply the received changes to the local mode, so they should not be of interest for this question.

Yes, while processing a remote change the Jupiter component is blocking both local changes and processing other remote changes. As a result, it processes the received activities sequentially and also ensures that no local changes can be made while processing a remote activity. Activities that arrive "in parallel" (i.e. changes made by different clients at the same time) and conflict each other are processed using operation transformation to ensure that they are applied correctly.

Being able to ensure that no local changes are being made while processing a remote activity is a vital requirement for this process (see discussion/examples above).

marvis97 commented 3 years ago

Thank you very much for the fast response. As I read in the Jupiter-Paper, they are using in general the dOPT operational transformation algorithm with some improvements. To handle two remote changes as shown in the picture, they save a history of changes for the period of diverged server and client. Did you know, how they or saros in generell know, how long they need the history with no acknowledgements after recieving? Maybe if an new change with actual status arrived at the client?

image (see Paper)

tobous commented 3 years ago

Did you know, how they or saros in generell know, how long they need the history with no acknowledgements after recieving? Maybe if an new change with actual status arrived at the client?

Saros uses the VectorTime for this purpose. The vector time holds the activity count for the server and the client. It just counts up for each applied activity (irrespective of whether the activity was created locally or was created remotely and then applied locally). The vector time is file-specific, meaning each file has its own vector time instance. Furthermore, the vector time is client-specific. Each client only has access to its own vector time but the server has access to all client vector times. This is necessary as the server is the central distributor for all of the activities.

This concept of the vector time is also shown in the diagram you referenced. the 0,0, 1,0, etc. references the activity count of client and server.

So yes, your assumption was correct, the status of the server/clients is implicitly transferred with each activity as it is contained in the vector time. For the clients, they can drop saved activities once the receive a message that contains a vector time from the server that has a higher client vector time than the vector time of the saved activity (as this means that the server has successfully processed it). For the server, it works pretty much the same with the only difference that it has to wait for all clients to reply with a higher server vector time.

marvis97 commented 3 years ago

@tobous Thank you very much for all your responses!