rejectedsoftware / vibenews

Combined web forum and NNTP server implementation for stand-alone newsgroups
GNU Affero General Public License v3.0
44 stars 3 forks source link

"Forum index" link misleading #21

Open luismarques opened 10 years ago

luismarques commented 10 years ago

Inside an individual forum post, the button "Forum index" is misleading. I tend to read (and click) that as an option to return to the "index of posts of the current forum", instead of the actual "index of all forums". Perhaps it could be changed to something more user friendly, like "All forums"?

Also, although logically it makes a lot of sense, I feel that the current (same line) button structure (◀ Forum index, ◀ ) makes me hesitate in my navigation. Could it be replaced by a more tree-like (indented, multi-line) navigation, as can be seem for instance in http://forum.dlang.org/group/digitalmars.D ?

s-ludwig commented 10 years ago

The "Forum index" -> "All forums" change has been done. I agree that this is a better naming choice.

Regarding the navigation style, my plan is to eventually revamp the whole page style and the navigation would be part of that. Until then, at least the HTML for the navigation could be layed out in a way that enables a tree-like styling with customized CSS. Two possibilities come to mind:

  1. use nested <ul> tags
  2. emit style casses "level-0", "level-1", ... for the existing <li> elements

Option two sounds more attractive to me due to keeping backwards compatibility and having a less awkward structure (it's not a tree after all, but just a linear path through the tree). Any thoughts or different ideas?

luismarques commented 10 years ago

Yes, option 2 makes more sense. We have a list of the directions (items) we must take to reach the desired target. How about just using :nth-child(n) instead of emitting element classes?

luismarques commented 10 years ago

Sorry, I meant :nth-of-type(n)

s-ludwig commented 10 years ago

Good idea, I didn't have that on the radar.

s-ludwig commented 10 years ago

BTW, do you think the layout makes you uncomfortable, or maybe rather that the current entry is missing on the navigation line. I think adding the caption of the current page already makes the structure much more clear.

luismarques commented 10 years ago

I think what was happening is that when I wanted to go back to the thread list page I would start to move my eyes to the top of the post page, to find the navigation links, and then start reading (from the left) the line with the navigation links; I would have to skip past the forum list link to I find the thread list link. That, together with the currently deployed confusing name of "Forum index", would make me hesitate and slow down in the navigation.

I think one advantage of having a multi-line/hierarchical/indented navigation would be that when we do this process of starting to look higher in the page for the thread list link we would find it quicker, since the visual pattern is quickly recognized and the link we want is the first one (going from the bottom to the top of the page). The same-line pattern makes as much sense logically, but probably is slower to recognize visually; I found myself having to (more carefully) read the text of the links to know where I wanted to click. Perhaps this was just the fault of the misleading "Forum index" name, but I suspect not. It's hard to tell now since I'm no longer distractedly navigating the site :-)

Indeed, I wonder how hard would it be to make a scientific experiment, where half of the IP addresses would be served a single-line navigation and the other half a hierarchical one, then measure the times between starting to scroll up and clicking in the navigation link :-). Is it common for web frameworks to include AB testing, or do people tend to use services like those provided by Google?

Anyway, after fixing the "Forum index" problem the rest is probably not too important, so don't worry too much about it. Still, I like it when the designers really sweat the details, so I guess it's nice we are having this discussion. I guess I can try overriding the css locally to see how much of a difference the hierarchical style really makes, after the "Forum index" fix is deployed.

s-ludwig commented 10 years ago

I've updated the site with the new navigation and additional avatar images now. The avatar images still produce too much visual noise, IMO, at least the ones for "last post".

The A/B test as a feature of the web framework is an interesting idea, no idea if there is any support for that in the common frameworks, but at least it would require a very high level framework. Isn't there some kind proxy server that randomly switches between two server instances and keeps timing information? I'd imagine that could be a nice replacement for external Google tools or similar. ... but for this small project, a significant statistic will probably take some months to acquire ;)

luismarques commented 10 years ago

In the new navigation bar you now have the current item show up in the list. I think that interacts badly with the left triangle (◀). I used to interpret the left triangle as "go back to", but that does not make sense for the last (current) item.

Notice that in the "path bar" of the Finder in OS X you have a right triangle (https://www.google.com/search?q=finder+path+bar&tbm=isch), which is coherent with funneling / diving into / restricting to / going inside, so that confusion doesn't happen; also, the triangle is between the items, not (inside a box) with the item.

By the way, about the "Option two sounds more attractive to me due to keeping backwards compatibility". Do (many) people use this software in other sites? I was thinking of using it for the discussions of the episodes of a podcast that I have planned.

s-ludwig commented 10 years ago

I just know of two instances, where it is used, but now I've invalidated my own argument by changing a lot of things in the HTML structure...

Regarding the triangle, try to do a full reload of the page, seems like the old CSS is still cached.

luismarques commented 10 years ago

I see you've been busy! I looks nice the new layout. Although there seems to be an (unrelated) bug with deleted threads. For instance, a spam thread of "Safe Weight Loss Surgery Severe Obesity by on" shows a reply count of 18446744073709551615 (I guess you used replies == total number of posts - 1, and since the post was deleted it wrapped around and is now ulong.max). Why are deleted threads showing in the post list?

Anyway, as I told you, I'm considering using vibenews, so that's probably going to be three instances now :-) One thing that gives me pause me is that the forum does not track the unread posts, like forum.dlang.org (does it?). I think that's the most important thing missing.

s-ludwig commented 10 years ago

Those spam threads are strange, but they seem to have something to do with restarting the server. I'll investigate that. You are right about the wrapping integer - the thread post count is correctly decremented to zero, but for some reason the thread still isn't deleted from the database.

Tracking read posts would be nice to have, but I didn't have time to implement it. Do you know how exactly Vladimir has done it for the dlang.org forum? Are individual posts/threads tracked, or just the last visit time of the user? Tracking individual threads seems like it could get expensive if the number of tracked threads isn't limited. So maybe storing the information in a cookie with a limited number of threads would be the most practical solution.

luismarques commented 10 years ago

I checked the implementation. It tracks the read status of individual posts. The code is not exactly pretty, and the implementation strategy might surprise you. It implements a bit array (not properly abstracted, mixed with the rest of the code) with the read status of the posts, which is zlib (un)compressed and base64-ed. This data is either stored in a cookie (guest users) or in the database (sqlite3, the database used for all the functionalities).

public void setRead(size_t post, bool value)
{
    needReadPosts();
    auto pos = post/8;
    if (pos >= readPosts.length)
    {
        if (value)
            readPosts.length = pos+1;
        else
            return;
    }
    ubyte mask = cast(ubyte)(1 << (post % 8));
    assert(pos < readPosts.length);
    auto pbyte = (cast(ubyte*)readPosts.ptr) + pos;
    if (value)
        *pbyte = *pbyte | mask;
    else
        *pbyte = *pbyte & ~mask;
    readPostsDirty = true;
}

By the way, why did you choose MongoDB for vibenews? If I use vibenews I'll have to have two DBs (Postgres + Mongo) in a server with little memory. Would you accept contributions for multiple DB support?

s-ludwig commented 10 years ago

MongoDB was just the logical choice back then, because a good driver was already written and it doesn't require much join-like functionality. I'm currently heavily working on a DB abstraction and typing layer for another project and that would be a perfect fit to add multi-DB support here. However, I'm still not completely happy with the query/update syntax, so the API isn't really stable yet. Just reimplementing the Controller class in terms of Postgres would definitely be the faster way and if you'd like to do that, I'd accept a pull for that in the meantime.

s-ludwig commented 10 years ago

Hm, but the compression approach seems to be a good idea for storage. The only issues would be really high-volume servers due to the memory requirements for the uncompressed bit field, and currently there is no unique running counter for posts, but rather just a BsonObjectID, so it can't be applied directly. However, there is a running index for each post in a group, so a map for each user mapping from group name to group bitfield should work.

luismarques commented 10 years ago

But why have a single DB field encoding the read status of all the posts for a given user? (or even for a given forum, for a given user). Wouldn't it be better to have something with more granularity? You have to get and put a large object like that in the DB every time you want to update one of the bits.

On Mon, May 5, 2014 at 11:38 AM, Sönke Ludwig notifications@github.comwrote:

Hm, but the compression approach seems to be a good idea for storage. The only issues would be really high-volume servers due to the memory requirements for the uncompressed bit field, and currently there is no unique running counter for posts, but rather just a BsonObjectID, so it can't be applied directly. However, there is a running index for each post in a group, so a map for each user mapping from group name to group bitfield should work.

— Reply to this email directly or view it on GitHubhttps://github.com/rejectedsoftware/vibenews/issues/21#issuecomment-42176042 .

s-ludwig commented 10 years ago

The idea behind it is that most parts of the bit array will be uniformly zero (unread posts from the past) or uniformly one (read posts from the present) and thus highly compressible, so that this field should usually never be larger than a few hundred bytes or so. Would be interesting how the statistic for this looks on forum.dlang.org.

My idea would have been to store a single time stamp of the last visit on a per thread basis, but that would typically use up a lot more memory, even if it would save the constant de-/inflate operations.

luismarques commented 10 years ago

I got the data, I'll produce some statistics.

On Mon, May 5, 2014 at 1:29 PM, Sönke Ludwig notifications@github.comwrote:

The idea behind it is that most parts of the bit array will be uniformly zero (unread posts from the past) or uniformly one (read posts from the present) and thus highly compressible, so that this field should usually never be larger than a few hundred bytes or so. Would be interesting how the statistic for this looks on forum.dlang.org.

My idea would have been to store a single time stamp of the last visit on a per thread basis, but that would typically use up a lot more memory, even if it would safe the constant de-/inflate operations.

— Reply to this email directly or view it on GitHubhttps://github.com/rejectedsoftware/vibenews/issues/21#issuecomment-42182514 .

luismarques commented 10 years ago

About the stats, a detailed analysis will take some time. I did a plot of the bit array, generating huge bitmaps, and visually found at least an issue that might impact the stats (unused post ids, which are probably impacting the compression, due to additional noise). Still, for now let me just share this with you. These are the compressed sizes (in bytes) for all the registered users, in ascending (compressed) size:

lens4

Other stats did not have quite the behavior I expected; for instance, when you order the entries (users) by their sum of read bits I would expect that the users which have almost all posts read would have higher compression ratios than the median users, which have lots of posts marked as read, but less systematically so (they read them more haphazardly, and therefore I would expect an increase in the kolmogorov complexity). That doesn't quite happen though. I'll try to finish the stats and write about that.

Anyway, about mongodb vs postgresql... I was thinking that it would be interesting to try to make a transition using postgres 9.4, which has jasonb support, which apparently is even faster handling json documents than mongodb! That would make the transition easier. I'll look into that.

s-ludwig commented 10 years ago

Okay, that looks more than I thought... 25kB max and ~30% of the users above 1kB is quite a lot (actually that would be quite OK for all but very high volume forums, but anyway). Considering that there are about 400k posts on the newsgroup, that would make about 50kB uncompressed. Reaching a mere 50% compression ratio there seems like quite an achievement, even if Gzip doesn't reach the Kolmogorov optimum.

Hm, well, I guess for me personally it would alternatively be acceptable to just store a list of the last 50 visited threads and their visit time stamp (~50 * (4B [post id] + 4B [time stamp]) = ~400B per group or ~50 * (12B [threadid] * 4B [time stamp]) = ~800B total) . Any thread not contained in this LRU list would be considered read. This would be much more coarse, but it should still play pretty well with the current linear thread display It would also support querying individual threads instead of transferring everything at once.

postgres 9.4, which has jasonb support, which apparently is even faster handling json documents than mongodb! That would make the transition easier. I'll look into that.

Do you mean data migration, or just adjusting the code? If the former, I'd say that it should be pretty easy to use the Controller interface to migrate the data regardless of the database representation.