scragg0x / realms-wiki

Git based wiki inspired by Gollum
http://realms.io
GNU General Public License v2.0
833 stars 91 forks source link

Create WikiPage class to encapsulate wiki page functions #147

Closed gazpachoking closed 7 years ago

gazpachoking commented 8 years ago

I was looking in to #145, and discovered that wiki.get_history was very slow when the repo had a lot of commits, even when limited to the first entry, like wiki.get_page was using to fill the page['info'] dict. I also noticed that most usages of get_page didn't even use the provided info dict.

This PR refactors wiki.get_page() to return a new class, WikiPage, which has data, info, and partials properties, lazily loaded when accessed. This means the expensive info property never needs to be loaded when browsing the site normally.

The WikiPage class is also the only thing that touches the cache now, and the data and info property are cached separately.

I'm developing on Windows at the moment, and haven't gotten my testing environment set up yet, so there may be a couple tweaks needed, including this known one:

Other possible improvements: getting the full history is veeeerrrryyy slow when there is a big commit history. Perhaps that should also be cached. It would be nicer if we could figure a way to speed up the first load as well though. A lesser optimization would be to have history load up the info cache, to get the latest commit that affects the file, and walk the ancestry tree from there, would be variable how much that speeds things up, but for files that haven't been touched in a while it could be quite a bit.

gazpachoking commented 8 years ago

@scragg0x How do you feel about this change?

scragg0x commented 8 years ago

I like it, it looks good. I haven't used realms with a large repo so I haven't experienced a slow down but what you say makes sense.

gazpachoking commented 8 years ago

@scragg0x I slipped a commit in there to make the tests work on my Windows box. Hope that isn't over the line :wink:

gazpachoking commented 8 years ago

I believe .data and .info will crash currently if called on a wiki page that doesn't exist. I don't really think that's an issue, but perhaps it should be a standardized error class, rather than just leaking the KeyError from dulwich. You can always test the truthiness of the WikiPage to see if it exists before trying to get .info or .data

I also currently have some asserts in the .write, .rename and maybe a couple other methods, that only lets you call them if the WikiPage sha reference is HEAD. Once again, I think that's good behavior, but perhaps it should be a different error than an AssertionError.

gazpachoking commented 8 years ago

I haven't used realms with a large repo so I haven't experienced a slow down but what you say makes sense.

I migrated our 8 year old trac wiki to git which created 5000 some commits, I might still squash down the history for our production site, (still preparing for the real migration,) but I figured at least regular page loads shouldn't be getting affected no matter how long history is.

I have some ideas for history view optimization as well, but mostly it's to do with using cache smartly, as I think it's inherently slow (at least with dulwich implementation) to search through the whole commit ancestry to find commits affecting given files. I think that can go in a separate PR though.

gazpachoking commented 8 years ago

EDIT: Moved some comments to separate issues for discussion: #148 #149

stevezau commented 8 years ago

+1