Closed pjrobertson closed 11 years ago
There was a wrong setting for the Configuration folder which caused my Xcode to not recognize the files, that's why the duplicate imports were there. I tried merging your pull request and couldn't build the plugin at all, but after fixing the setting and clearing Xcode's derived data, it works for me.
Regarding the history, I only read the entries from ~/Library/Application Support/Google/Chrome/Default/History
which is an sqlite db, and it doesn't contain any content, only the URLs and the titles of the pages.
If you look in ~/Library/Application Support/Google/Chrome/Default/
, you should see files called History Index YYYY-MM
. Those seem to contain the keywords for the pages you have browsed which leads me to suspect that the search is not based on the actual content saved to disk, but rather an index of keyword. This makes sense if you think about it; there's better search performance and the amount of data you need to store is reduced if you only keep the keywords.
Thanks ndreas, I guess the config file business was the main problem.
RE the history: If you search in the history for some text you know was on a page you previously browsed, the history search shows a snippet of this text. I'm 99% sure the whole content is stored as a string, here's what I did yesterday:
I was on a sleeper train, no internet, and needed to know when my train got to the destination (this is China, so asking isn't an option since my Chinese is below average :P) I knew what time the train left, so searched for that "17:46". It showed a snippet of the text on the page from my matched "17:46" then about another 20 words, but not the destination time. I then incrementally searched for the last word in the snippet, moving down the string/page until I found what I wanted. Laborious, I know - but it makes it pretty clear the content is stored as a string somewhere
I'll look into it, don't worry - but thanks for the info :)
On 30 Meh 2013, at 17:04, Andreas Johansson notifications@github.com wrote:
There was a wrong setting for the Configuration folder which caused my Xcode to not recognize the files, that's why the duplicate imports were there. I tried merging your pull request and couldn't build the plugin at all, but after fixing the setting and clearing Xcode's derived data, it works for me.
Regarding the history, I only read the entries from ~/Library/Application Support/Google/Chrome/Default/History which is an sqlite db, and it doesn't contain any content, only the URLs and the titles of the pages.
If you look in ~/Library/Application Support/Google/Chrome/Default/, you should see files called History Index YYYY-MM. Those seem to contain the keywords for the pages you have browsed which leads me to suspect that the search is not based on the actual content saved to disk, but rather an index of keyword. This makes sense if you think about it; there's better search performance and the amount of data you need to store is reduced if you only keep the keywords.
— Reply to this email directly or view it on GitHub.
The history stuff sparked my curiosity, so I dug a little deeper. It seems the pages_content
table in the History index files contain the content without HTML tags, not just keywords. I tried searching for an obscure word in my history, got a hit, and looked in the index db for the corresponding month, and found the content in pages_content
.
Yep, found it now, cheers. Time to write a quick app so that I can view it. Why is Chrome so stupid that you can't :/
With Safari I reverted to trying to read the preview icons that were created for each page. With Opera… you can tell it to cache everything. So much better.
On 30 Meh 2013, at 21:16, Andreas Johansson notifications@github.com wrote:
The history stuff sparked my curiosity, so I dug a little deeper. It seems the pages_content table in the History index files contain the content without HTML tags, not just keywords. I tried searching for an obscure word in my history, got a hit, and looked in the index db for the corresponding month, and found the content in pages_content.
— Reply to this email directly or view it on GitHub.
This bugged me today when I tried to build the plugin. They're all imported by Quicksilver.pch from the Configuration folder
P.S. I was playing with the plugin to see if you'd worked out how to read the Chrome history cache files? I know that Chrome stores the text for each page visited somewhere, since you can search for any text on a page in the history view (⌘Y). But it bugs me that I can't actually see the text - for offline browsing. I want to try and find the original content and make it viewable :)