zadam / trilium

Build your personal knowledge base with Trilium Notes
GNU Affero General Public License v3.0
26.96k stars 1.89k forks source link

(Bug report) Sync anomalies in Trilium Version 0.61.13 #4435

Open Nriver opened 11 months ago

Nriver commented 11 months ago

Trilium Version

0.61.13

What operating system are you using?

Windows

What is your setup?

Local + server sync

Operating System Version

win10

Description

I've been using version 0.61.13 for about a week, and I've noticed significant changes in the sync mechanism, leading to several issues:

  1. Upon upgrading from version 0.60.4 to 0.61.13 on both the server and client, the data sync process involves excessive downloading and uploading of data. Despite my database being 1GB, the internet traffic indicates several GBs, resembling an long sync where the entire database appears to be uploaded and downloaded multiple times. I had to wait for a long long time before it finish.

  2. In versions 0.60 and earlier, I could simply delete my client-side database and download the server database for a quick data restoration, and I do not need to sync as they are the same data. However, in version 0.62, repeating the same process results in the extended long sync scenario. The only workaround I can find is to delete the local database, perform initialization within Trilium's UI, and wait for the sync which is much much faster than that long sync.

  3. Occasionally, the client initiates a long sync, and I'm uncertain about the specific action triggering it. It's possible that clipping a page and subsequently deleting it before the synchronization,or something about erasing deleted notes triggers it. Unfortunately, I can't recall the exact sequence of events.

  4. The occurrence of long sync coincides with corruption in some Chinese note titles, as detailed in my post on https://github.com/zadam/trilium/issues/4412. I compared the db backup hours before it goes wrong. I can see the currupted titles (which are of different notes that I see in the issue). And there are lots of changes in the entity_change table. (Not sure if this is the side effect of the above third issue)

Error logs

No response

Nriver commented 10 months ago

I have upgraded to version 0.61.15 and then version 0.62.2. I cleared all entries in the entity_change table. Then I run Trilium to reconstruct the records. This resulted in a reduction in the total number of records from 160,000 to 100,000.

After the server reconstruct, I deleted the client's database and performed a complete data re-initialization. Everything appears to be functioning smoothly for the past week.

But it happened again today.

I usually run two clients, one is on Linux (A), the other on Windows (B). Today, one of the clients happend to have a short long sync. Although it's not transfering the whole database, but there are still lots of data transfer on both upload and download.

What's worse is if I let A finish the sync. Then B will have to go through the long sync. When B finish, A will have to make the long sync. This loop seems to be never ends. And I have not change the note content in this process.

Nriver commented 10 months ago

I've sent today's log file to your email.

zadam commented 10 months ago

Hi, it looks I missed this issue when you originally posted it. I will take a look.

zadam commented 10 months ago

The long syncs are likely caused by sync protocol error checking believing that there are errors and trying to fix the problem by replaying (parts/"sectors") of the database. That's what I saw in the logs from today, only for one "sector".

One lead I'm following is the UTF encoding. JavaScript uses UTF-16, the data in SQLite are in UTF-8, so there is conversion. There could be some conversion in the network layer as well. While this should not lead to corruption, it could in theory lead to different byte-level representation since in Unicode it's possible to represent equivalent strings using different code points, which would then encode different control hashes.

It's an unconfirmed hypothesis, however normalizing the strings before hash calculation should be done in any case.

I'm not sure if this could explain also #4412 since different conversion strategies should not lead to invalid characters IMHO.

Nriver commented 10 months ago

The system level default encoding for most of the linux system is UTF-8. But in the Chinese version of Windows system, the default encoding is GBK. (Which is stupid to me, we should use UTF-8 for modern systems.)

In python, we need to explicitly specify the UTF-8 encoding. Otherwise, it will use the system level encoding which may cause encoding/decoding error.

with open('test.txt', 'w', encoding='utf-8') as file:
    file.write('你好')
# Without explicit encoding, it will read the utf-8 content with GBK
# it may cause error or make some strange outputs.
with open('test.txt', 'r') as file:
    print(file.read())

The output of above python code reads 你好 (hello in Chinese) as 浣犲ソ which is nonsense.

If I change it to 我们的 (ours in Chinese), it will raise an error.

UnicodeDecodeError: 'gbk' codec can't decode byte 0x84 in position 8: incomplete multibyte sequence

I'm not quite familiar with node.js. But I suspect if there is some module somehow uses the system level encoding instead of UTF-8, something strange may happen.

Nriver commented 10 months ago

I've updated to 0.62.3 and the long sync is still there. There must be something we haven't considered yet.

Nriver commented 10 months ago

Today I performed the following tasks on Linux (A). This time only one client was involved.

I took notes as usual, inputting text and images into several different notes. Then I sync with the server manually, and the sync time was very short, only took several seconds.

I've deleted some notes recently and I do not need them anymore. So I used the Erase deleted notes now option in Recent changes, followed by Check database integrity, Find and fix consistency issues, and Vacuum database in the Advanced options page.

After that, the long sync appeared again and took significantly longer than the last few days. It is uploading and downloading a substantial amount of data.

Nriver commented 10 months ago

I happened to have manual backup on client and server several hours ago. I restored the backups on both client and server and I did some tests.

  1. Manual sync client and server for several times. The sync takes few seconds which is ok.
  2. Hit Erase deleted notes now.
  3. Manual sync becomes long sync, huge upload and download.

I also tried Check database integrity or Find and fix consistency issues or Vacuum database in step 2, it does not affect the sync.

So, there must be something wrong in the erase process which triggers the long sync.

I've send the original logs to you email.

Nriver commented 10 months ago

I searched for in the backups and did some test on them. Can comfirm that the the Erase deleted notes now and sync will trigger corruption in some Chinese note titles.

I also find one tag value which is corrupted. But only one, so far.

zadam commented 10 months ago

Hi @Nriver , thanks for the logs.

I was able to reproduce the long sync - the key was indeed the note erasing, but also the note has to have revisions. This should be resolved in the patch version soon.

However, this doesn't alone explain the corruption of Chinese characters. It might be produced as a consequence of the long sync (data is being overwritten), but how exactly the data gets corrupted is still unclear.

Nriver commented 10 months ago

I can confirm the long sync for single client's erasing deleted notes is fixed after upgrade to 0.62.5. Thank you!

Then I try to use both linux and windows client today, the long sync loop still exists. But the data amount significantly reduced compared to previous version.

The corruption of Chinese characters still exists. But the number of corrupted notes has been cut down a lot, something from over 20 to approximately 5.

KeeKalm commented 10 months ago

我也遇见这个类似的情况,客户端一直在发动同步请求,并有很多同步队列,查看日志发现多个hash不同的笔记; 同时也出现中文编码同步后异常的问题; 我现在不得不放弃同步笔记。

I have also encountered a similar situation where the client keeps initiating synchronization requests and there are multiple synchronization queues. Upon checking the logs, I found multiple notes with different hashes. At the same time, there is an issue with abnormal Chinese encoding after synchronization. Now, I have no choice but to give up synchronizing the notes.

Trilium Version 0.62.3

justyns commented 10 months ago

I am using 0.62.4 and haven't tried 0.62.5 yet, but I can confirm I keep running into this "long sync" issue as well. I haven't noticed any notes getting corrupted, but I do sometimes get a lot of "recovered - " notes. I suspect it has something to do with killing trilium's connection (putting laptop to sleep, etc) while it's in the middle of one of these syncs, but haven't done much testing.

I do normally use 1 server (linux) and 3 clients: macos, android (via termux), and windows 10.

After reading through this thread, I decided to stick to a single server and client for a few days. I deleted document.db from the (macos) client and let it reinitialize from the server. I'll update if I can reproduce it with a single client. If not, I'll introduce one of the other OS/clients and see what happens.