novoid / Memacs

What did I do on February 14th 2007? Visualize your (digital) life in Org-mode
GNU General Public License v3.0
1.03k stars 66 forks source link

how do you deal with HUGE data? #48

Closed kidd closed 6 years ago

kidd commented 6 years ago

Hey, The idea of Memacs is just beautiful. It's been on my TODO list for some time now and decided to give it a go.

As an addition to the modules there, I created one to fetch visited urls from firefox history. The problem is that over the last 6 months, I have more than 52k urls listed in ff's sqlite database, and although it's ok to export them to a 25Mb org file using memacs_cvs, emacs struggles even opening it (not to say integrating it into org-agenda).

Do you use any trick to lower the footprint of that? I guess having different files, one per month for example would be something, but if we follow that direction, probably pulling the data needed from sqlite directly and recreating the org file just for the needed days would probably be more functional.

Any ideas on that?

alphapapa commented 6 years ago

probably pulling the data needed from sqlite directly and recreating the org file just for the needed days would probably be more functional.

That's an interesting idea. It's not exactly related to this, but I have a prototype Org indexer package for Emacs that indexes Org entries into a SQLite database, based on John Kitchin's work he published on his blog. If you'd find it helpful in some way, let me know and I'll post it.

JayDugger commented 6 years ago

Yes, please post this prototype Org indexer package.

On Sat, Jul 14, 2018 at 2:49 PM, alphapapa notifications@github.com wrote:

probably pulling the data needed from sqlite directly and recreating the org file just for the needed days would probably be more functional.

That's an interesting idea. It's not exactly related to this, but I have a prototype Org indexer package for Emacs that indexes Org entries into a SQLite database, based on John Kitchin's work he published on his blog. If you'd find it helpful in some way, let me know and I'll post it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/novoid/Memacs/issues/48#issuecomment-405045748, or mute the thread https://github.com/notifications/unsubscribe-auth/AACCrwL7jCJgi03Q_VEuyiAH5CR6tl2Rks5uGkswgaJpZM4VP5mC .

-- Jay Dugger (314) 766-4426

alphapapa commented 6 years ago

@JayDugger You can see the code here: https://github.com/alphapapa/helm-org-rifle/tree/org-rifle/sandbox It's very messy, pre-prototype code, but maybe you can get something out of it. It does work, but it's not organized to work as a package with user-facing commands, so you have to eval the code manually.

Note as well that much of it is directly copied from John Kitchin's blog post. I also haven't touched it for a while, so take it for what it's worth. :)

alphapapa commented 6 years ago

P.S. I guess the place to start is here, which appears to be approximately where I left off: https://github.com/alphapapa/helm-org-rifle/blob/org-rifle/sandbox/org-rifle-indexer1.el

JayDugger commented 6 years ago

Thank you.

On Sat, Jul 14, 2018 at 3:15 PM, alphapapa notifications@github.com wrote:

P.S. I guess the place to start is here, which appears to be approximately where I left off: https://github.com/alphapapa/ helm-org-rifle/blob/org-rifle/sandbox/org-rifle-indexer1.el

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/novoid/Memacs/issues/48#issuecomment-405047220, or mute the thread https://github.com/notifications/unsubscribe-auth/AACCrx2MyMzSlOujBeuuVPDktideJPXpks5uGlFlgaJpZM4VP5mC .

-- Jay Dugger (314) 766-4426

novoid commented 6 years ago

Hi @kidd,

Well, my answer won't make you happy I guess.

My memacs agenda is slow.

Stats on my memacs files alone:

  710528 headings in   1507643 total lines
      1859 task headings
      708669 non-task headings

  217 open tasks:
      TODO:      108
      STARTED:   107
      WAITING:   2

  1642 finished tasks:
      CANCELLED: 1
      DONE:      1641

... where those task headings within memacs files are part of archived very old files and not related to my daily workflows any more.

Read about (early) performance measurements and optimizations: https://github.com/novoid/Memacs/blob/master/docs/performance.org

Probably more of interest is https://github.com/novoid/Memacs/blob/master/docs/FAQs_and_Best_Practices.org which describes some mitigation stuff I did.

The work of Kitchin where he moved things into SQL is quite interesting. I did not try this on my own. I rarely use my memacs agenda and therefore I accept the bad performance.

I'm going to close the issue for now. Please report back on any insights - others may want to know.