lcpu-club / overleaf

PKU LaTeX
https://latex.pku.edu.cn
GNU Affero General Public License v3.0
38 stars 4 forks source link

Git集成(双向备份) #9

Open HolgerHuo opened 1 month ago

HolgerHuo commented 1 month ago

https://github.com/ertuil/overleaf

AllanChain commented 2 weeks ago

Summary of the Current Investigations

Git Bridge

In Overleaf, the Git bridge and GitHub sync are not entirely the same functions. The Git bridge allows pushing and pulling from git.overleaf.com/project_id, whereas GitHub sync performs bidirectional synchronization from external hosting platforms.

The previously mentioned implementation by USTC only provides a Git bridge. Their olgitbridge uses web APIs for pushing and pulling, and its server does not need to run on the same machine as Overleaf. However, olgitbridge has certain limitations:

Note that contrary to the cloud version this bridge does not use realtime change operations, any files changed by git will result in a "this file has been changed externally" interruption in the online editor.

Overleaf's own implementation of the Git bridge is also tricky: https://github.com/overleaf/overleaf/issues/782#issuecomment-693336644

Third Party Data Store (TPDS)

To circumvent the complexity of the Git bridge, we can refer to the implementation of Dropbox synchronization. In Overleaf's codebase, services like Dropbox are referred to as third-party data stores.

We can ignore the fact that the external Gitea server is a Git server and treat it simply as a regular file service:

This way, we can heavily reuse the logic of Dropbox synchronization in Overleaf. Good news is that the open-source version has not completely removed the relevant logic for Dropbox, which significantly reduces the workload.

The logic for synchronizing changes from Overleaf to external storage is roughly as follows:

graph LR
    A[Frontend] -->|Document modification| B[Web Service]
    B -->|"/enqueue/web_to_tpds_http_requests"| C[TPDS Service]
    C -->|"Some logic"| D[External data store]

From modifying a document in the frontend to the web service issuing a /enqueue/web_to_tpds_http_requests request, the logical flow is largely intact. The part responsible for triggering /enqueue/web_to_tpds_http_requests is handled by TpdsUpdateSender.js. The TPDS Service is located in the services/third-party-datastore directory and needs to be implemented.

The logic for synchronizing changes from external storage to Overleaf is roughly as follows:

graph LR
    A[External data store] -->|webhook| B[Web Service]

The main logic is found in TpdsUpdateHandler.js. Specifically, modifications might be needed in methods such as TpdsController.updateProjectContents to adapt to Gitea's webhook.

HolgerHuo commented 2 weeks ago

Reference:

This may not be portable to other potential users of this instance, i.e., to integrate other git implementations, anyhow, this is yet to be discussed.

AllanChain commented 2 weeks ago

Even if we choose to use Git bridge-based solutions, we still have to write different logics to support different Git hosting platforms, because they can have different web hooks, authentication methods, etc.