Open hwiorn opened 2 years ago
Hi, sorry for late response!
It's actually surprising it takes 17 minutes, for 8K notes/24K URLs -- do you know how many lines are these? Unless your laptop is really weak, I would expect it to index much faster. Maybe you can log indexing times for individual notes, figure out the one that takes longest and then we can profile it?
Otherwise, so you suggest you could do something like
last_indexing_time
to Joplin indexer
(currently it's not stored anywhere, but I guess won't be too hard to store)last_indexing_time
and current time, extract visits from them and insert in the DBIt kinda makes sense, but one downside is that it's possible that some URLs were removed from the note, and they would still be present in promnesia database, because the 'interface' of indexers in Promnesia is currently only supporting adding new visits. So it would trigger some phantom visits. We might think of changing the interface somehow, but I'd much rather speed up the indexer for simplicity.
My laptop is Dell Inspiron 7501(i7-10750H CPU @ 2.60GHz 16GB RAM). I don't think this laptop is a slow environment. But some machine such as RPis and AWS light-sail(1 core) could be slow.
It's actually surprising it takes 17 minutes, for 8K notes/24K URLs -- do you know how many lines are these? Unless your laptop is really weak, I would expect it to index much faster.
Many notes were from Evernote. I used Joplin as an archiving tool and wrote a journal at work. Some notes are web-clipped notes, and It seems to have many useless links. Recently, I am switching the Joplin to org-roam and learning the Zettelkasten method and I use Joplin as way-back machine now.
Maybe you can log indexing times for individual notes, figure out the one that takes longest and then we can profile it?
The Joplin indexer was a proof-of-concept, and It is just an initial version. So I think I can profile the indexing.
It kinda makes sense, but one downside is that it's possible that some URLs were removed from the note, and they would still be present in promnesia database, because the 'interface' of indexers in Promnesia is currently only supporting adding new visits. So it would trigger some phantom visits.
Right. Incremental and partial update needs two metadata at least.
We might think of changing the interface somehow, but I'd much rather speed up the indexer for simplicity.
Yeah, you are right. I can optimize the indexer better. But I think Promnesia needs incremental update code for slow machine and indexing efficiently.
I don't think this laptop is a slow environment
Yep, looks decent, surprising it takes so much time!
But I think Promnesia needs incremental update code for slow machine and indexing efficiently.
Yep, definitely agree it makes sense to make it as fast as we can :) Just mean there is a tradeoff between that and simplicity of the architecture.
Right. Incremental and partial update needs two metadata at least: Last sync time, Mapping ID between source and target.
Yeah -- the problem is the latter: basically currently there is no way to tell for a visit from database which file it's coming from. To be more precise, no reliable way, there is a Locator
thing, but it's not guaranteed to be the exact filename.
Maybe a good compromise would be adding cachew support for file-based indexers, so basically each file would have a cache of its Visits (depending on the file timestamp), and it would automatically recompute if necessary. That would allow keeping promnesia itself simple and not worry about selectively removing stuff from the database.
Maybe a good compromise would be adding cachew support for file-based indexers, so basically each file would have a cache of its Visits (depending on the file timestamp), and it would automatically recompute if necessary. That would allow keeping promnesia itself simple and not worry about selectively removing stuff from the database.
I had already seen the cachew and I thought it was not the right solution for caching. I guess I didn't look closely. Let me add cachew in the indexer.
Related: #243
I have made a Joplin indexer. But there is a problem that the indexer needs a incremental updating parameter when the database is large. I have 8000+ notes in my Joplin database. Joplin indexer finds 24000+ URLs which can be
Visits
. It takes 17 minutes long on my laptop.Joplin has a
update_time
field innotes
table. So I think I can implement incremental indexing(updating) in the indexer.However, there is no
overwrite_db
parameter in the Indexer when a user pass--overwrite
parameter and wants to restart the indexing. Or iflast_indexed_time
in thepromnesia
framework would be passed byiter_all_visits
, It would be much more helpful.