scambier / obsidian-text-extractor

A (companion) plugin to facilitate the extraction of text from images (OCR) and PDFs.
GNU General Public License v3.0
346 stars 19 forks source link

[BUG] Text Extractor cache does not play well with Obsidian Sync #37

Closed doug-w closed 9 months ago

doug-w commented 1 year ago

Text Extractor with Sync set to sync Core Plugin Settings will constantly oversync the cache files between devices.

Right now I have several thousand files syncing from my iPhone to my windows desktop: 2023-09-05 17:26 - Server pushed (deleted or renamed) [iPhone] .obsidian/plugins/text-extractor/cache/4bc606059de3c57d21e5f2c8b60bb7a9.json 2023-09-05 17:26 - Deleting .obsidian/plugins/text-extractor/cache/46451ba5314cf72906b85cb3dd2823d3.json 2023-09-05 17:26 - Accepted .obsidian/plugins/text-extractor/cache/46451ba5314cf72906b85cb3dd2823d3.json 2023-09-05 17:26 - Deleting .obsidian/plugins/text-extractor/cache/8790dd9cc6b14fa8d897d770200a77c2.json 2023-09-05 17:26 - Server pushed (deleted or renamed) [iPhone] .obsidian/plugins/text-extractor/cache/2d1787936a36696626e055bcf7a5fc34.json 2023-09-05 17:26 - Accepted .obsidian/plugins/text-extractor/cache/8790dd9cc6b14fa8d897d770200a77c2.json 2023-09-05 17:26 - Deleting .obsidian/plugins/text-extractor/cache/e51e091a19186848251a110e29cf0df8.json 2023-09-05 17:26 - Accepted .obsidian/plugins/text-extractor/cache/e51e091a19186848251a110e29cf0df8.json 2023-09-05 17:26 - Server pushed (deleted or renamed) [iPhone] .obsidian/plugins/text-extractor/cache/bab303aa3fa821c223cea7722567498d.json 2023-09-05 17:26 - Deleting .obsidian/plugins/text-extractor/cache/8d3133a58e624f5829f6c3be56e28a6b.json 2023-09-05 17:26 - Accepted .obsidian/plugins/text-extractor/cache/8d3133a58e624f5829f6c3be56e28a6b.json 2023-09-05 17:26 - Deleting .obsidian/plugins/text-extractor/cache/ad3cadc7f5494af321d55c6a1460b6b9.json 2023-09-05 17:26 - Accepted .obsidian/plugins/text-extractor/cache/ad3cadc7f5494af321d55c6a1460b6b9.json 2023-09-05 17:26 - Server pushed (deleted or renamed) [iPhone] .obsidian/plugins/text-extractor/cache/85bd021edff855f9022cec4ad2837e75.json 2023-09-05 17:26 - Deleting .obsidian/plugins/text-extractor/cache/0a1e9f22809e1ace8e4bfafcdccdbfe4.json

When that's done the windows desktop will push them back in.

Given that they're in the .obsidian folder I can't tell the sync plugin to exclude the directory. I'm not sure if there's anything an API developer can do to say don't sync this folder or not?

scambier commented 1 year ago

I don't have Sync (nor an iPhone) to check what's happening exactly, but I find it weird that there's an explicit deletion from the iPhone 🤔 There's maybe some remnant code that misbehaves on mobile, I'll check that.

I'm not sure if there's anything an API developer can do to say don't sync this folder or not?

Not to my knowledge. Text Extractor's cache is also designed to be syncable, since extraction is not working on mobile.

doug-w commented 1 year ago

Thanks for the fast response, I've disabled text extractor fully for the moment which makes me quite sad.

gratzel commented 1 year ago

Seeing the same problem syncing three windows desktop PCs. They are fighting over the cache about once a day. Also happens on the iphone, but I disabled that one. Keeping text extraction cache local to each machine would be a possible solution.

Rubenkl commented 1 year ago

Yes, same issue (2 windows laptops + ios device). For now I only enabled text-extractor only in one of my devices.

Nico-de-Vries commented 11 months ago

I'm experiencing the same issue. I have two Windows systems with Obsidian Sync, and they continuously resend the text extractor cache.

I've conducted further investigation into what's happening here. My set includes 11,115 items, of which 1,010 differ between the systems. The difference lies in the fact that those 1,010 items are successfully OCR-ed on one system (the "text" field of the JSON file contains text), but remain empty on the other (the "text" field of the JSON file is ""). The situation appears random: sometimes the first system performs successful OCR while the second system fails, and sometimes it's the other way around.

To my understanding, this issue could be resolved if the text extractor never replaces a JSON file that has actual text in the "text" field with one that has "" in the "text" field. Then the system that OCRed the image successfully will win and the endless loop will end.

nachbelichtet commented 10 months ago

Same problem here. Without Text Extractor enabled, Obsidian syncs notes instantly over all connected devices. With Text Extractor enabled, it syncs hundreds of cache .json files on every start and sync, which causes delays and often problems with the sync of the real "payload". Nevertheless: Thank you very much for you efforts.

scambier commented 10 months ago

(thanks github for automatically closing issues...)

I've published an update (0.5.1) that could fix this issue for some of you.

If this setting image wasn't identical on all devices, there was an automatic cache invalidation that would cause issues during sync.

I disabled this behavior, so now users have to manually clear the cache after changing the OCR languages.

scambier commented 9 months ago

Closing this, reopen if the issue isn't fixed for you