storage, indexing/search, backup/restore - just use postgresql?

ThomasWaldmann commented 6 years ago

In the past, we invested a lot of effort into the storage (storing data/metadata into all sorts of backends) and middlewares (indexing/searching [whoosh], namespaces, etc.). Also into serializing the stored data for backup purposes, etc.

We could also implement a rather different approach: Require PostgreSQL (and do not offer an alternative option, for simplicity) and put everything in there.

Most important metadata (like item names, revisions, timestamps) would go into some old-fashioned db columns, less important and custom metadata would go into jsonb fields. We would put text/binary content data into the db (just to do it all in the same way, at least initially).

Pro:

we can just use it and do not need to maintain or document it
it has a big community of developers/supporters/users
it has many years of development and testing, thus likely less bugs than our storage code?
likely all (or most) problems of storing and retrieving data in an efficient way are already solved
same for backups / replication / etc.
a lot of developers / admins know how to deal with pg
less effort for admins who already have pg running
access stored data via custom pg queries

Con:

a bit more effort for small / desktop wikis, you need to install pg first
inefficient for storing (encoded) binary data (would not matter for most [small / mid sized] items, but significant for huge items)
current moin devs do not have much pg experience (tw: little, rh: ?)
big effort of rewriting our current code / test setup

Unclear:

search capabilities: does it offer all we need, so we can get rid of whoosh and just use pg indexing/search/lookup?
rather not use an ORM (offering only the smallest common featureset to support all sorts of DBs?), but talk to pg rather directly

RogerHaase commented 6 years ago

https://www.postgresql.org/download/windows/ lists 2 installers for windows, EnterpriseDB and BigSQL.

Tried BigSQL first, had an error during installation - too many parameters passed to autostart. Installation completed, but could not start a server. Tried to uninstall several ways, got message that it could not find uninstall.dat. Other users have same issue: https://superuser.com/questions/1358042/uninstall-postgresql-bigsql-distro-10-5-1 Finally just deleted the directory.

On 2nd try, installed EnterpriseDB. This installed OK, but PC was rebooted suddenly after first installation was complete. Was able to uninstall easily using normal windows uninstall. Rebooted and installed EnterpriseDB again - no surprise reboot this time. The Windows Task Manager shows 8 PostgreSQL Servers running. Was able to start psql and pgadmin4. I think I created a database using psql but was unable to see it using pgadmin4. Probably a user error. Starting psql or pgadmin4 seems to start another PostgreSQL server, the Windows Task Manager now shows 11 copies running.

PostgreSQL seems too big and complex. Anyone wanting a desktop wiki should stick to moin 1.9.x.

RogerHaase commented 6 years ago

Rather than build an index option into moin, maybe we could use external options.

Web base wikis

For any web based wiki, there is google (and maybe other search engines); For a google search. you need to put a button somewhere:

<input id="googlesearch" name="googlesearch" type="submit" onClick="return searchGoogle();"
value="Google" alt="Google">

and then you need a bit of javascript

// executed when user clicks "Search Google" button, onclick event set in fixedleft.py
function searchGoogle() {
    // redirect wiki search form to google
    'use strict';
    var searchInput = document.getElementById('searchinput'),
        wikiUrl = document.getElementById('wikiUrl');
    if (searchInput && searchInput.value && wikiUrl) {
        searchInput = escape(searchInput.value);
        window.location = 'http://www.google.com/search?q=' + searchInput + '&sitesearch=' + wikiUrl.value;
        return false;
    }
}

The result is a fast search with very little effort.

Desktop wikis

For desktop wikis on Windows, there is the search index feature. This is will be difficult to use because of the randomized file names, but if most wiki content starts with an H1 tag with content similar to the wiki item name, then a persistent user will find the wanted item.

A 1 minute search turned revealed the ContentIndexer https://blogs.msdn.microsoft.com/adamdwilson/2016/03/30/searching-private-app-data-in-windows-10/ Are there similar options for Linix?

Item saves will be much faster if some background function is used for indexing.

ThomasWaldmann commented 6 years ago

google is not an option (many people use non-public wikis). also, we do not just need some web search, we also need to look up itemid/revid by item name. or tags or users or ...

why did you think pg is too big and complex? i am sure it can be used for big, but i guess we would not fully exploit its capabilities.

RogerHaase commented 6 years ago

I think it is big (on windows) because:

name is BIGSQL, EnterpriseDB (not lite)
disk space is 476 MB
starts 8-11 SQL servers of 1 MB each
requires separate root/admin ID
docs are voluminous, setting up desktop wiki becomes 10x harder
where does user with installation problems go for help?

ThomasWaldmann commented 6 years ago

@RogerHaase I don't care much about the name, but about what it gives to us and where actual problems might be. Especially not-one-time issues. No problem (IMHO) if installing it is a bit more one-time effort if the software then works reliably, offers good speed and features.

I do not consider 500MB on-disk or 100MB memory usage an issue. Nowadays, people play games that take 50GB disk space or run chat clients or browsers that eat multiple GB RAM. Not saying that we should waste resources for no reason, but if we get a good subsystem for it and save work by not having to code/maintain it ourselves, it might be well worth it.

No problem to get support for postgresql, a lot of people use it and they are well-connected to the python community. We of course would offer some scripts that create a new database, initialize tables, etc.

If we would use an ORM like sqlalchemy (which was strongly recommended to me even if I only wanted pg), there would be an easy way to also use sqlite or something (maybe losing some features then, which is why I still think the pg-only might be a good idea, just to get rid of having to support a lot of different stuff).

moinwiki / moin

storage, indexing/search, backup/restore - just use postgresql? #715

Web base wikis

Desktop wikis