karlicoss / promnesia

Another piece of your extended mind
https://beepb00p.xyz/promnesia.html
MIT License
1.75k stars 74 forks source link

doc: browser-history instructions missing #339

Open ankostis opened 1 year ago

ankostis commented 1 year ago

I noticed that browser-history instructions are completely absent from SOURCES.org, so I could provide a MR (if i figured how to do it:-)).

Would it suffice to do the following:

  1. run some this browserexport commands (which ones??)

  2. configuring the respective HPI MODULE, and

  3. somehow (?) instructing promnesia to hook from the above HPI SOURCE.

Another Q regards the old txt files, generated by the deprecated-script: how to also parse them into a unified browser history?

purarue commented 1 year ago

For 1, you should use browserexport save -b [browser] -t ~/data/browsing or something like that, see here

Can copy over the example part of the readme:

$ browserexport save -b firefox --to ~/data/browser_history
$ browserexport save -b chrome --to ~/data/browser_history
$ browserexport save -b safari --to ~/data/browser_history

For 3, after my.browser is configured in HPI, the Visit source just needs a quick transformation, like this. That source (currently in my own promnesia module repo) can be copied over/merged here

karlicoss commented 1 year ago

Yeah, I've been meaning to switch promnesia to use https://github.com/seanbreckenridge/promnesia/blob/master/promnesia_sean/sources/browsing.py Just will need to implement a fallback to the old method (in case someone doesn't have HPI configured, similar to what takeout does) Also will run some final comparisons first just in case nothing is lost/timezone issues

purarue commented 1 year ago

@karlicoss if you'd like, I can make a PR with some instructions for using browserexport here. Should that go in SOURCES.org or somewhere else? edit: perhaps a section in GUIDE.org?

karlicoss commented 1 year ago

@seanbreckenridge thanks, of course would be appreciated! GUIDE.org feels a bit more generic/high level. I think a separate section in SOURCES.org would be good (since the ones that are listed there are just autogenerated from docstrings).

ankostis commented 1 year ago

Given the opportunity, browsers forget their history pretty soon, and extending the history span is the most precious artifact of promnesiq for me. I would like the instructions to explain in detail how not to lose one accumulated history elements, when updating. And whether duplicates and overlaps are dropped.

Forgive me is what I'm saying doesn't make sense.

karlicoss commented 1 year ago

No, that makes sense @ankostis ! Worth mention why even bother with setting up a promnesia module if the extension can work with local browser history directly. And in addition it works across different devices/browsers/etc regardless the way cloud sync is set up.

Promnesia itself isn't dealing with browser history backups etc though -- so I guess the instructions will link to more detailed instructions in https://github.com/seanbreckenridge/browserexport#usage

purarue commented 1 year ago

And whether duplicates and overlaps are dropped.

Duplicates in this case would use the timestamp as part of the unique check, so whenever a new database is backed up to your local data directory (e.g. ~/data/browsing), running promnesia index should pick it up. And it you have the my.browser.active_browser module setup, it'll additionally snapshot your current browser database whenever you run index. I'll expand on this in docs I PR in a bit

Just as an example (I havent switched to the new browser module yet, so ignore the promnesia_sean.sources. prefix), the page we are currently on looks like this for me:

image

karlicoss commented 1 year ago

I guess if you use active_browser (I haven't set it up yet), you'd have duplicates in the extension coming both from the backend and from the local browser history API (if they have different source names). Maybe worth doing some frontend changes to handle that

purarue commented 1 year ago

Ah, that is true... even my.browser.export might have duplicates (with local browser history API) if you recently backed up a database. I haven't found it to be too bothersome, but might be worth deduping them