laurent22 / joplin

Joplin - the privacy-focused note taking app with sync capabilities for Windows, macOS, Linux, Android and iOS.
https://joplinapp.org
Other
46.16k stars 5.02k forks source link

Sync silently ignores new files with old timestamps #6517

Open jacquesh opened 2 years ago

jacquesh commented 2 years ago

When syncing to the file system, new files with a timestamp older than the last sync timestamp get silently ignored (instead of being pulled into Joplin).

This situation arises for me because I run Joplin on my PC and phone, have them both sync to the filesystem and then use syncthing to sync the two. syncthing maintains file last-modified metadata across devices, meaning that if my mobile device is not busy syncing (could be that syncing is off, I'm out of network range, off wifi, etc) and I add a note, wait a day and then let it sync, then the file on my PC will have a creation date of right now and a last-modified date that is a day in the past. Joplin ignores it.

Environment

Joplin version: 2.7.15 (Although this has been happening for at least a year, I updated recently and I think I was on 2.3.6 before that) Platform: Desktop, Windows 10 x64 21H2/19044

Steps to reproduce

  1. Set joplin to sync to the filesystem and ensure it is all synced up.
  2. Close Joplin
  3. Create a new note file (I copied an existing one and changed the name & ID in the file to a new hash)
  4. Set the file's last-modified time to before your last Joplin sync
    • e.g in powershell: (Get-Item "c6989cfc4d4d9114591431119fe5e8a5.md").LastWriteTime=("15 May 2022 19:38:00")
  5. Open Joplin & sync
  6. See that it does not pull in the new file

Thoughts following my own short investigation into the synchronisation code

I've looked through the sync code in Joplin and I see it is relying entirely on the last-modified time of the file. In particular it will ignore any file with a timestamp older than Joplin's last sync time even if the file is unknown to Joplin. It seems to me that Joplin should instead read in new files even if they're older than the last sync time but there may well be other factors that I'm not aware of. I know little about the filesystem format but at the very least Joplin would need to check the contents of each new file to know what its reading in (my local instance has 200 notes but >500 files, some of which appear to be recordings of edits? Presumably these contains data from tables in the sqlite DB other than the "notes" table?)

Admittedly the same problem can happen with file edits. This is more difficult to resolve since you can't use the existence of the file as proof that a sync is needed. In that case, you might be able to use the file-specific sync time (in the sync_items table, assuming that actually contains what the name suggests) to do a somewhat better job, but short of hashing the file contents (which would be very robust but also far slower than checking file times and obviously problematic on all of the non-filesystem sync targets, possibly worth doing only for the filesystem sync target?) I'm not sure there's a great solution for that case. If the conclusion is that this behaviour is a problem then maybe solving the new-file case is at least a good start?

I can see about changing my non-joplin file-sync setup so that the file timestamps are updated when the file syncs (which would prevent this issue from happening to me in the first place) but that might not be as easy as one might hope and would only solve the problem with my particular setup (other file-system setups may run into the same issue).

This could possibly be the related to #5099 and #6346

tomasz1986 commented 2 years ago

This situation arises for me because I run Joplin on my PC and phone, have them both sync to the filesystem and then use syncthing to sync the two. syncthing maintains file last-modified metadata across devices, meaning that if my mobile device is not busy syncing (could be that syncing is off, I'm out of network range, off wifi, etc) and I add a note, wait a day and then let it sync, then the file on my PC will have a creation date of right now and a last-modified date that is a day in the past. Joplin ignores it.

I've experienced the exact same problem, also when using Syncthing. In my case, I believe this usually happens when Joplin performs synchronisation while Syncthing is still downloading the files, which results in a partial Joplin sync. Then, when trying to sync again, Joplin doesn't detect the notes that have been downloaded in between. I wasn't sure what the problem was about exactly though. Thank you so much for finding the actual culprit.

Is there any workaround that you'd suggest to do at the user's side as a temporary solution? Other than disabling automatic sync completely and relying on manual sync only, of course 😉. When encountering the error myself, I've up to now relied on the "delete local data and re-download from sync target" button, but this requires a manual action, while I'd prefer to rely on an automated solution here.

I think https://discourse.joplinapp.org/t/sync-issues-using-filesystem-sync-and-sync-tool/2652 from 3 years ago (!) talks about the same issue. It mentions touching all files in order to update their timestamps as a possible workaround. Is there any danger in doing just that (e.g. with a periodically run script)?

Edit 1:

I performed the whole touch thing, and after that Joplin was able to discover a dozen items that had been missing, which I wasn't even aware of… I haven't observed any peculiarities after doing it. However, I must mention that the process is slow, especially on low-end hardware and mobile devices, depending on the number of notes.

Just to be thorough, and also because people in that forum thread were asking about how to do touch in Windows, the actual command is copy /b file.md+ (or for /f %a in ('dir /b *.md') do copy /b %a+ to process all the md files).

Edit 2:

I think a possible workaround could to be sync the folder in Syncthing to a different path first, and then use some kind of a background script that would make sure that the folder has been synced completely (e.g. if no changes have happened in the last x minutes), and only then copy that folder in a single pass to yet another folder that would be used strictly as the sync target for Joplin. Doing so would eliminate the partial sync issue. Obviously, this is possible only on the desktop, and not with the mobile versions of Joplin, which likely suffer from the same problem. Also, it wouldn't be perfect either, as the synchronisation in Syncthing can still be interrupted, e.g. by a disconnect, and then the sync state will end up being incomplete anyway.

Edit 3:

I think I've managed to hack up a PowerShell script that "touches" only newly added files, while skipping everything else.

Get-ChildItem -Path *.md | Where-Object {$_.LastWriteTime -le (Get-Date)} | ForEach-Object {$_.LastWriteTime = (Get-Date).AddYears(1)}

The script checks the modified time of all notes whether it's equal or older than the current moment, and then, if the condition matches, it updates it to 1 year from now. On the first run, this forces Joplin to rescan all notes, but later on, it only makes it rescan new/modified notes, while skipping those that have had their date modified already.

I think the date can be set even further into the future, so that even once the 1-year period has passed, Joplin will still not be forced to go through those notes, but the value of "1" year just seemed like a safe bet.

trymeouteh commented 2 years ago

This has been a big issue for me when using the file system + syncthing to sync notes. I been using file system + syncthing to sync my notes for 4 months now and every month I end up having this issue and having backup my notes and reset the Joplin apps on all of my devices and redo the sync. It is very irritating.

I would like to see Joplin fix this as this is considered a major bug. I like Joplin but lately it has not been performing too well, I hope Joplin get some contributors to help touch up the app since it is a good app overall and would like to see it perform to its fullest potential.

Szybet commented 2 years ago

thanks @tomasz1986 for the investigation, dzięki

I have created a linux version of his command

find -mtime -5 -exec touch {} +

Every file older than 5 days will be re-synced so it will be showed. Also with syncthing, detecting conflicts doesn't work, but the files conflicter are still there so:

find . -type f -name '*conflict*' -exec kate {} +
XilinJia commented 1 year ago

Perhaps something related to "Edit 3:" from @tomasz1986 can be implemented in Joplin, as follows:

Instead of touching the files, Joplin can set back the timestamp of the previous scan and so at the next scan (either manual or automatic) it will scan based on the reset timestamp. How to set back the timestamp? I would suggest to provide a UI to the user so that, once he discovered some abnormalities with the sync, he set back the timestamp to say one week ago, or one month ago, etc. Would that an easy fix?

individual-it commented 1 year ago

@Szybet I think the command find -mtime -5 -exec touch {} + will find and touch all files that was modified within the last 5 days. from find man page: File's data was last modified less than, more than or exactly n*24 hours ago To fix this problem, don't you want to touch all files that haven't been touched recently? For that you would need find -mtime +5 -exec touch {} + Or do I misunderstand something?

Szybet commented 1 year ago
A numeric argument n can be specified to tests (like -amin,
-mtime, -gid, -inum, -links, -size, -uid and -used) as

+n     for greater than n,

-n     for less than n,

n      for exactly n.

yes, you misunderstood

laurent22 commented 10 months ago

I add a note, wait a day and then let it sync, then the file on my PC will have a creation date of right now and a last-modified date that is a day in the past.

On Joplin side, wouldn't the solution be to look at both the creation and mast-modified timestamps and use whatever is the latest one for comparison?

Would that also address the case where SyncThing and Joplin are synchronising at the same time?

tomasz1986 commented 10 months ago

Does Android even support file creation time? I don't think it does (e.g. see https://android.stackexchange.com/questions/232633/android-11-supports-creation-time-crtime-btime-for-local-folders-and-files).

Would that also address the case where SyncThing and Joplin are synchronising at the same time?

I think it's a bit more than just syncing at the same time. For example:

  1. Start sync with Syncthing.
  2. The connection drops, leading to partial sync.
  3. Joplin syncs its database.
  4. Syncthing restores the connection later, and syncs the rest of the files.
  5. The newly added files, however, are now completely ignored by Joplin.

I believe this is also why this bug/issue is basically bound to happen at some point.

laurent22 commented 10 months ago

Hmm, but the current logic seems sound to me: Joplin keeps the latest timestamps of the files that have been synced. So let's say we start with these file timestamps:

Joplin will take the list of current files and process this - so it will process the file with timestamp t0, then t1, then t2.

If SyncThing is still synchronising in the background too, any new file it will add will have (or should have) a timestamp that comes after t2 - say t3, then t4.

Once Joplin is done synchronising, it's going to save the timestamp t2. Then next time it synchronises, it will start from there and look at t3, then t4.

For me, this should work provided timestamps are making sense - i.e. if I add a file to a folder, updated and created time should be the current time, not a time in the past. Or if SyncThing is doing something fancy and changing timestamps, at least one of the timestamps should be correct, which is why I suggested looking at both created and updated time.

The connection drops, leading to partial sync.

This should be fine - if Syncthing synchronises later, logically the created files should take the current time, not one in the past. Since it's a new file, it shouldn't have a timestamp in the past.

tomasz1986 commented 10 months ago

I think what happens is basically something like this:

  1. Create a few notes on the desktop. Let's say their modified time is 6:00.
  2. Syncthing manages to synchronise only some of the notes, then it disconnects.
  3. Joplin on Android performs its sync at 6:30, adding the new notes into the database.
  4. Syncthing reconnects at 7:00 and syncs the remaining notes.
  5. The newly synced notes still have their "modified time" at 6:00, but Joplin's last sync was at 6:30, so it ignores everything before that time.

There is no "created time" on Android, only "modified time", and that is synced with the files.

I work around the problem by using the PowerShell script to set the modified time to the future.

I think someone could try to reproduce the problem simply by manually (i.e. without using Syncthing at all) copying some of the MD files first, syncing those with Joplin, then copying the rest and trying to sync again.

SiddharthManthan commented 9 months ago

This issue makes file system sync unusable. Perhaps a fix could be priortized ?

Kamul-PL commented 4 months ago

Has anyone managed to find any sensible (safe and convenient) solution to this problem?

aflip commented 3 months ago

I'm syncing between two linux laptops and i do not face this issue. I manually edited the update time and create time of notes using the GUI, synced, and the notes show up just fine. This seems like it was solved.

tomasz1986 commented 3 months ago

I'm syncing between two linux laptops and i do not face this issue. I manually edited the update time and create time of notes using the GUI, synced, and the notes show up just fine. This seems like it was solved.

The issue isn't about the time inside Joplin, it's about the modified time of the actual MD files on the disk.