owncloud-archive / news

:newspaper: News app for ownCloud
GNU Affero General Public License v3.0
290 stars 106 forks source link

Article sorting of feeds in folders messed up #836

Open t0mcat1337 opened 9 years ago

t0mcat1337 commented 9 years ago

Hi everybody,

I have multiple feeds organized in folders. When clicking on the folder title all (unread) articles of all feeds in this folder are displayed as expected, BUT: the sort order (newest to oldest) is messed up... (see screenshot). sortorder

Bug or feature? ;)

Environment: Owncloud 8.1.1 (stable) News 5.3.9 Browser: Firefox 40 (or any other) PHP 5.5.9

Greetings T0mc@

BernhardPosselt commented 9 years ago

Feature, articles are sorted by date added (id) which is a requirement for fast auto paging. This happens only in the beginning, afterwards it should basically be sorted by date.

Search the bug tracker for a full explanation ;)

t0mcat1337 commented 9 years ago

So I got u right that articles are not really sorted by date (like a "Select * from ...order by date DESC") when they r displayed but only when they r inserted to the DB what makes the primary key (I think that is what u mean with ID) the sorting thing? If this is right I see that sorting as I would expect in folders is excluded by this design right now. So it is absolutly impossible to make sorting more flexible? Adding and/or create an Index to the date field in the DB is not an option?

I could imagine that more people r confused about this behaviour as they see articles not sorted by date despite the app announces them to be

BernhardPosselt commented 9 years ago

see https://github.com/owncloud/news/issues/115

The issue is basically that the offset is invalidated when you mark items read. The only solution I can see is to load everything which would defeat the purpose of autopaging and make the app very slow

BernhardPosselt commented 9 years ago

https://github.com/owncloud/news/issues/812#issuecomment-112149192

t0mcat1337 commented 9 years ago

I'm currently trying to dig into the news app code in special and Ownclouds AppFramework in common as I have common experiences in PHP/Ajax aso. development for a couple of years now. Actually I'm thinking of something like a concatenated ID Field in the "_news_items" Table.. somehow this way: New_ID = "pub_date-id" ('1439996640-237113') which could be used as the unique id field itself or just something like an "orderHelper" Could this be a valid approach?

I try to make those changes by myself actually but as I've just started right now I'm fighting a little with the AppFramework ("findEntities" ;) )

BernhardPosselt commented 9 years ago

Yes, would be a valid approach :)

BernhardPosselt commented 9 years ago

GZ :)

BernhardPosselt commented 9 years ago

Hm, basically you could also

ORDER BY pub_date, id DESC

I've tried that once, however it didn't work out. Can't remember the catch anymore

BernhardPosselt commented 9 years ago

Ah right, that issue was the offset because you can't do:

WHERE pub_date < 123 OR IF pub_date EQUAL < id

Your approach would solve that

BernhardPosselt commented 9 years ago

Some thoughts: since you are combining 2 64 bit long numbers (20 digits), you'd need to pad them with zeros on both ends to avoid a string sorting like this:

Combining both padded numbers would therefore turn into a 40 characters long string. Your example (1439996640-237113) would therefore need to become 0000000000143999664000000000000000237113.

BernhardPosselt commented 9 years ago

findEntities does nothing more than map the result from the sql query (assoc array) onto the data objects by using the entity's fromRow method: https://github.com/owncloud/core/blob/master/lib/public/appframework/db/mapper.php#L318

To test your stuff, add a new 40 characters wide text field much like the guid_hash https://github.com/owncloud/news/blob/master/appinfo/database.xml#L211 to the same file and increase the version number to trigger the database migration (e.g. 6.0.2): https://github.com/owncloud/news/blob/master/appinfo/info.xml#L10 :

<field>
    <name>sort_order</name>
    <type>text</type>
    <length>40</length>
    <notnull>false</notnull>
</field>

Then add the field in pascalCase to the item entitiy: https://github.com/owncloud/news/blob/master/db/item.php#L59 :

protected $sortOrder

Finally each time when a new item is inserted:

you'd also need to update the field, e.g.:

$item = $this->itemMapper->insert($item);
$item->generateSortOrder();
$item = $this->itemMapper->update($item);

the generateSortOrder method would ofc have to be created on the Item class first and would look something like this:

public function generateSortOrder() {
    $this->sortOrder = str_pad($this->pubDate, 20, STR_PAD_LEFT) . str_pad($this->id, 20, STR_PAD_LEFT);
}

Once you hit the JS part, notify me

t0mcat1337 commented 9 years ago

sorry for my delay but I was late at home yesterday ;) Lets see what u wrote:

Hm, basically you could also ORDER BY pub_date, id DESC I've tried that once, however it didn't work out. Can't remember the catch anymore

This was the first thing I tried, but then the articles weren't sorted by date in the GUI despite they where, when I fired the resulting SQL directly to the DB. So I thought that there must be some mechanism which sorts the results from the DB afterwards (the JSON Object - as seen in Firebug - was sorted by ID) which tooks me to the "findEntities"...

Some thoughts: since you are combining 2 64 bit long numbers (20 digits), you'd need to pad them >with zeros on both ends to avoid a string sorting like this:

1 10 2 3 30 41 5

Combining both padded numbers would therefore turn into a 40 characters long string. Your >example (1439996640-237113) would therefore need to become >0000000000143999664000000000000000237113.

If one don't need a seperator (like the "-") couldn't this field be a BIGINT? And as the first part is the unix epoch which will always be the size of 10 digits (for a very long time ;) ) will this really be necessary?

findEntities does nothing more than map the result from the sql query (assoc array) onto the data >objects by using the entity's fromRow method: https://github.com/owncloud/core/blob/master /lib/public/appframework/db/mapper.php#L318

So "findEntities" doesn't do some sort of sorting? Like "sort by the first field in the resulting dataset"? I'm still wondering why the articels in the GUI where sorted different, than when firing directly to the DB (as described above)...

Thanks a lot for the little documentation about how using this stuff. As I said, never worked with this before ;) Then lets give it a try (hoping I will manage this ;) )

BernhardPosselt commented 9 years ago

BIGINT is usually a 64bit number, check mysql docs:

BIGINT[(M)] [UNSIGNED] [ZEROFILL] A large integer. The signed range is -9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615.

BernhardPosselt commented 9 years ago

The sorting in the GUI is done using Angular: https://github.com/owncloud/news/blob/master/templates/part.content.php#L24 and https://github.com/owncloud/news/blob/master/js/controller/ContentController.js#L89

In case you order by sortOrder https://github.com/owncloud/news/blob/master/js/controller/ContentController.js#L89 would have to return -sortOrder and sortOrder respectively.

BernhardPosselt commented 9 years ago

The GUI/JS stuff is a bit more complicated, you will run into autopage issues. Check these files:

t0mcat1337 commented 9 years ago

sooo... what I managed so far:

1.) Inserted a new field "sort_helper" to the table oc_news_items (BIGINT, Not signed) 2.) Manually updated this field, setted it to: CONCAT(pub_date,id) for the whole table. Looks like this now f.e.: '1439969520236639' 3.) Made the XML changes u mentioned above and so forced an OC Update 4.) Changed the SQL in "makeSelectQuery" to : ....'ORDER BY items.sort_helper ' . $ordering; 5.) Debug "findAllFolder" by inserting some fwrites and found out, that the resulting object of "findEntitiesIgnoringNegativeLimit" now is in the right chronological order, as the SQL would assume. 6.) Debugged the same way step by step till the index function of the itemcontroller. At this point the sorting is still correct as in the SQL.

BUT: the articles (items) in the GUI stay NOT sorted in chronological order though. So I assume, now something on the client side (some js) is responsible for this re-sorting, right?

t0mcat1337 commented 9 years ago

I just realized, that the "itemController" correctly includes the new field "sortHelper":

.
.
[items] => Array
        (
            [0] => OCA\News\Db\Item Object
                (
                    [guidHash:protected] => 4b4b121f7ac464f0a23ea0e5b56934ef
                    [guid:protected] => 4b4b121f7ac464f0a23ea0e5b56934ef
                   .
                   .
                    [pubDate:protected] => 1439969520
                   .
                   .
                    [lastModified:protected] => 1439973912
                    [sortHelper:protected] => 1439969520236639
                   .
                   .
                )

BUT the JSON Object in Firefox doesn't??

In between times I simply tried to change these lines:

https://github.com/owncloud/news/blob/master/templates/part.content.php#L24 and https://github.com/owncloud/news/blob/master/js/controller/ContentController.js#L89

from "id" to "pubDate" (just to give it a try ;) ) but of course this didn't work... I got an ugly JS Error concerning something in Angular that I'm far away from understanding.

t0mcat1337 commented 9 years ago

Ok.... after fighting a little with this Angular stuff without any really success I simply deleted this:

orderBy:[Content.orderBy()] track by item.id"

here https://github.com/owncloud/news/blob/master/templates/part.content.php#L24

and now - together with my formerly mentioned changes - the GUI seems to respect the chronological order from the origin SQL...

Don't really know if this is a really valid solution... but could be an approach? :S Will hold an eye on it and see how it now behaves...

t0mcat1337 commented 9 years ago

Perhaps simplyfied this a little: By changing the SQL in ItemMapper.makeSelectQuery this way:

...ORDER BY CONCAT(items.pub_date,items.id)

there seems to be no need of an extra field in the DB...

BernhardPosselt commented 9 years ago

Concat could work but you'd still need to pad it with zeroes and I think the sorting by varchar is probably already slow enough.

As for the GUI the orderBy is probably not needed since everything gets reset on reload and we add new stuff in order.

t0mcat1337 commented 9 years ago

A right, CONCATs result is a string.. ok.. then lets cast it before sorting like this:

order by cast(concat(pub_date,id) as UNSIGNED)

Tested it in a little demo table with 2 integer fields and numbers with different length. Should work....

BernhardPosselt commented 9 years ago

Well, that won't work since you're overflowing ;)

Like I said you need to use a varchar, but you need to pad it with 0 to deal with the unwanted string sorting algorithms.

t0mcat1337 commented 9 years ago

Overflowing could be indeed an impact... but IMHO just theoretically because: Unsigned BIGINT in MySQL can be max

18.446.744.073.709.551.615

In my example the casted INT will be:

1.439.969.520.236.639

far away from the maximum.

When I'm thinking right, the last valid ID in the item table could then be

9.999.999.999

Reaching the ID 10.000.000.000 will result in an overflow.

Is it realistic that this will ever happen?

Verified this with my test table (one INT field as PK named 'ID', two BIGINT fields named 'inta' and 'intb') with a few rows of example data like occuring in the oc_news_items. This is my result:

SELECT cast(concat(inta,intb) as UNSIGNED) t,inta,intb 
FROM test
order by t

results in :

t inta intb
144014847223 1440148472 23
1440148472238396 1440148472 238396
1440148472238397 1440148472 238397
14401484722383970 1440148472 2383970
14401484729999999999 1440148472 9999999999
18446744073709551615 1440148472 10000000000

As u can see, the concat overflows in the last row, giving the max possible BIGINT value as result...

Whart do u think about that?

BernhardPosselt commented 9 years ago

Yes an overflow is actually possible on larger installations that run more than 5 years. Out of the roughly 20 digits 10 are occupied for Unix time. The rest is basically 10 billion possible id's. Let's say the install runs 10 years, that's 120 months, so we're left with 80 million ids per month. Assume an install with 20000 users, then you're left with 4000 articles per month and per user. Let's say an average feed has 500 articles per month then only having 8 feeds per user would be enough to make it overflow in 10 years. If there are more feeds than it even happens sooner

BernhardPosselt commented 9 years ago

Another thing: I did some calculations a while ago and I realized that 32 bit integer ids were too little so I upped to 64 bit. Given that unixtime overflows a 32bit integer in 2038 (google 2038 year problem) you can see that this is a bad idea ;D

t0mcat1337 commented 9 years ago

Yes an overflow is actually possible on larger installations that run more than 5 years. Out of the roughly 20 digits 10 are occupied for Unix time. The rest is basically 10 billion possible id's. Let's say the install runs 10 years, that's 120 months, so we're left with 80 million ids per month. Assume an install with 20000 users, then you're left with 4000 articles per month and per user. Let's say an average feed has 500 articles per month then only having 8 feeds per user would be enough to make it overflow in 10 years. If there are more feeds than it even happens sooner

Which leads me to my origin quistion: Is such a scenario really realistic? 10 years run time? Without any re-installation? 20000 users? each with 500 articles per month? What brings me to another question: What about houskeeping? Will the articles stay in the table for ever? When there r really billions of datasets in this table, won't there be other problems to be expected as just "a little overflow during cast"? Or are they DELETEd after beeing read an amount of time later? In this case isn't there a possibility to re-use IDs in MySQL? Can't really remember...

Of course dealing with VARCHAR is a valid way, I'm with u, but it's a little more complicated as u correctly mentioned (filling with 0). I like the KISS principle a lot (Keep it simple, stupid) ;)

As u r the boss of this (btw. fantastic) OC-App u decide which way to go... ;) Just wanted to share my thoughts

t0mcat1337 commented 9 years ago

Another thing: I did some calculations a while ago and I realized that 32 bit integer ids were too little so I upped to 64 bit. Given that unixtime overflows a 32bit integer in 2038 (google 2038 year problem) you can see that this is a bad idea ;D

lol Absolutly ... I wish u, that ur app will be used much longer than 2038 and then this is something what REALLY will happen, with no discussion

BernhardPosselt commented 9 years ago

Yes, scenario is realistic, I know of installs with 10000 users :)

It's better to keep this in mind than to deal with weird bug reports later on ;)

t0mcat1337 commented 9 years ago

Yes, scenario is realistic, I know of installs with 10000 users :)

Ah really? Big GZ for this!

It's better to keep this in mind than to deal with weird bug reports later on ;)

Yes, absolutly correct... just wanted to get an idea of how much the probability is for hitting the overflow... if it where REALLY little it won't be really useful to make more/complicate work than necessary (which then could again include other bugs ;) )

Nevertheless...how does housekeeping work?