mk-fg / feedjack

Feedparser-based feed aggregation django app
BSD 3-Clause "New" or "Revised" License
8 stars 7 forks source link

Feed Groups & other commits from the other fork #5

Open allo- opened 9 years ago

allo- commented 9 years ago

When did the support for feed groups get dropped and why?

mk-fg commented 9 years ago

Hey @allo-!

I don't think I ever got around to actually merging it since out conversation a few years ago - http://fraggod.net/code/fossil/feedjack/tktview?name=6f19c39609 - unfortunately.

Grouping feeds by arbitrary tags still sounds like a good idea, though at this point I suspect it might be a bit difficult merge, given all the changes in the code, but admittedly I don't remember how it was implemented in your fork at all.

allo- commented 9 years ago

Hmm, i thought they were in the squashed commit, but seems not to be so. I just want to upgrade my feedjack some time without losing this function.

btw: why was it squashed? I think this dropped quite a bit of (git) changelog and attribution to @cato-

mk-fg commented 9 years ago

It was in a fossil branch, and moving that stuff to git was messy, iirc in the end I said "screw it", just made one big diff and applied it as one commit there.

Sucks wrt attribution, but I don't mind rebasing the whole thing and putting whatever info you want there, if you want to. Though I'm a bit unsure about whether it'll be seamless, iirc rebasing huge chunks of commits in git, at least in the past, and with merges there, was always a mess.

Can probably do it right as well, untangling that squash, but it sounds like a bit of work, if it's mostly a matter of attribution, maybe text would suffice?

allo- commented 9 years ago

You might just look into who to attribute for now.

When i start again working on it, i may try to rebase it and/or cherry pick old commits on new ones, but this does not really matter for you. And the squashed commit is not that big, just multiple topics in one commit.

I am currently using feedjack like it is, but i still have on mind to build some per-user function, so it can be used like a google reader replacement. Currently i lack the time for it, but that's why i am looking into your new commit and how it can be merged then. ;)

allo- commented 9 years ago

It's hard to trace the merge, as git cannot easily handle the git/fossil differences. But the "plain" theme contains the templates for feedgroups.

btw. what of our features after the fossil merge were implemented here? I guess stuff like the reworked updater were reworked by you too and so it would not be needed to be merged. But maybe you want to cherry-pick some of the other commits.

Take care, my fork should have the newest versions, but some incomplete commits (i.e. template based on foundation was never finished). On the other hand i think you have merged the "new since" feature, but not the following patches for the feature.

mk-fg commented 9 years ago

Don't think I've looked at the fork's new commits after that initial merge, and I don't think straight-up cherry-picking of any commits is likely to work - as mentioned, due to fairly significant diffs in pretty much every file. So for trivial sutff, it's probably easier to take an idea and re-implement it on a new codebase.

As is probably obvious from the commit history, I'm not working on the project very actively, mostly just getting back to it when updating it to work on new Django, so I'm especially unlikely to go hunting for new features to implement here.

I intend to do that rebase thing, amending the squashed comment though, will check other stuff in the fork as well, but I think that's as far as my plans go here.


Of the stuff I already wanted to do on this project:

All buried under a long list of other stuff I have to do, of course, so no schedule of any kind - maybe won't ever get back even to these items.

allo- commented 9 years ago

I thought you seem so active with your fork and have a lot more commits after the merge than us, so i asked here.

In general i though about forking your code and trying to add the relevant pieces. But i see that you have the longer commit history since the fork and much more activity. So i would need at least the time to do all the merge in a short time period, to avoid diverging in the meantime again.

My personal list would be this:

Then i would try for new features and bugfixes:

later (what we wanted to implement, but lacked time): A full personal feedreader mode. The admin decides on feeds, users can wish for feeds. Users than can create subscriber objects for feeds and get their personal reader or go public and get their personal planet.

Currently we did nothing for a long time and you seem to be very active, so propably your fork is the useful base for anything further. What does not mean that all of these features must be in your code, they would first go into a new fork again. It's only about some of the base features / bugfixes where it's interesting if you like to merge them.

mk-fg commented 9 years ago

btw. /since/.../asc/ solves your "read from here" problem, the page numbers there are pretty stable (not 100% if feeds have wrong dates)

That's why I specifically mentioned it there, but you'd also need to have pages based on dates, i.e. so that it's not select * from posts order by date limit 10 offset 20 but select * from posts where date > X and date < Y - that's how you get page-10 loading in less than forever due to SEQ SCANs.

And yeah, page numbers still not actually stable, plus it should be "desc" order anyway, newest first, so it's better, but still not actually what I'd want to have.

I thought you seem so active with your fork and have a lot more commits after the merge than us, so i asked here.

I actually also appropriated https://pypi.python.org/pypi/Feedjack/

And yeah, lots of stuff done to support newer Django versions and such, plus some optional features I wanted to have, such as filtering, processing, etc.

So i would need at least the time to do all the merge in a short time period, to avoid diverging in the meantime again.

But it's not THAT active! I think there's no rush at all ;)

I.e. last bunch of commits is around May-1, and it were mostly minor bugfixes, and update for Django-1.X finally using argparse - not much of a change at all.

Things which were hard to merge that I was referring to above were built on top of code last touched in year 2008, and it's now 2015, so I'd say unless you plan to only finish it by 2022, everything should apply fine here ;)

get rid of fjcache, use only django.cache

fjcache is actually a django.cache.

Whole thing is a wrapper on djago.cache, so that instead of constructing keys everywhere - which is a bad idea, as this process have to be exactly the same, if one hopes to hit same cache - use one module that does that abstraction for you.

That said, iirc it's not ideal, as I only touched it briefly and long ago, probably doing some dumb stuff there.

endless scroll, with js history api for not losing the scroll position (not that important)

That's a nice thing indeed, but I think unusable if pages load as slowly as they are now, due to that SEQ SCAN thing I've mentioned - that should probably be fixed first, or maybe it's possible to work around that by pre-caching pages in advance.

user login, with "mark as read" option

That exists in default (bootstrap) theme, but with login to remotestorage, i.e. separate service, that can be hosted anywhere (and by anyone), not having to register on the feedjack instance at all.

I.e. it can use localStorage (if you only reading in one browser), or sync that to e.g. your own "host.com" instance. Kinda easy to work with and really cool concept, at least in my opinion, but I did neglect it quite a bit, need to update stuff to their newer spec.

And of course, easy to just run on the same Django instance (or host, if using non-django backend impl.) as feedjack and get same "mark as read" across multiple devices, if it's your feeds.

you seem to be very active

I think it's wrong, I'm barely touching the thing, these bunches of commits are months and years apart!!!

allo- commented 9 years ago

Okay, maybe i overestimated your activity ;). At least i guess we should finally have some common base and you fixed and improved a lot of stuff.

I did not really care about optimizing the SQL, yet. I have no long load times here (and on a much weaker server before it was okay, too). But i do not mind to have dates instead of page numbers, if needed they can still be split. Personally i use /asc/ mode to catch up with the news, which has correctly increasing page numbers to track your reading position. But some "unread articles" option would be nice and i do not see how you want to do it with localstorage or similiar concepts. I want features like "Give me all unread from group 'Blogs'". Acoss devices, which is the whole point in added a /user/allo/ view.

Regarding the cache ... i have turned of fjcache in my instance, because i never got the current page content without having to add ?asfasdf to the url. with django.cache lowlevel api it should be okay to cache serverside and send the most current content without such issues.

endless scroll could then be added by some of the exising django-modules for it, some of the easier methods would involve jQuery, which can ajax-load html pages and extract content with a selector. a xml/json api would be easy to add, too.

mk-fg commented 9 years ago

i never got the current page content without having to add ?asfasdf to the url

That means you had client cache, i.e. your browser cached the page, not related to backend and django cache at all, hence changing the URL helps.

Or maybe you have some cache in e.g. nginx, where chaning URL also helps

Can't think how that can be related to django.cache, and again - https://github.com/mk-fg/feedjack/blob/master/feedjack/fjcache.py - that's the whole thing, note that all it does is through django.core.cache.

To "turn off" fjcache in an instance, you'd need to disable Django cache, or rather not enable it, as e.g. described here - https://github.com/mk-fg/feedjack/#configuration :

Feedjack is designed to use Django cache system to store database-intensive data like pages of posts and tagclouds, so it is highly recomended to configure CACHES in django settings (memcached, db, files, etc). Feedjack will try to use cache with "feedjack" alias, falling back to "default" if that one is not defined. https://docs.djangoproject.com/en/dev/topics/cache/ https://docs.djangoproject.com/en/dev/topics/cache/#setting-up-the-cache

mk-fg commented 9 years ago

some of the easier methods would involve jQuery

These days all browsers can easily do load/selector stuff without it just as easily, btw, that wrapper is mostly for ye olde IE6 days, which are - thankfully - gone, I think ;)

allo- commented 9 years ago

I do not think it was a client cache issue, iirc it happend even with force-reload (strg+shift+r). But currently i have turned off cache and it works and my feedjack is not really slow.

And fjcache seems to me still too complicated, given what the cache api provides. I used cache.get/cache.set often directly, for client-side cache the decorators or the Middleware are useful.

But this is a minor detail, as long as it works and does not send stale content.

mk-fg commented 9 years ago

And fjcache seems to me still too complicated

Yeah, I think it can be cleaned-up for sure, and either provide orm-like cache_feed() or cache_site() interfaces exactly for what has to be cached, or maybe indeed just thrown away if these caches are updated and read in exactly one place.

mk-fg commented 9 years ago

But this is a minor detail, as long as it works and does not send stale content.

That's actually a question of cache invalidation, I don't think add ?asfasdf to the url should've fixed that, but I do remember fixing a few of such issues in fjupdate, there definitely was some issue with not invalidating some object caches.

Also, having pages based on either date range or id is cool because you can pretty much cache these forever, or even if not, at least invalidate easily.

for client-side cache the decorators or the Middleware are useful

Client-cache middleware that checks/serves ETag or such headers based on content hash probably aren't of much help, because browser will still essentially wait for whole page to be generated/rendered server-side, just not have to receive it afterwards (and it's probably not a huge html, so no big deal).

I.e. opening page when browser sends request with ETag and such middleware checks it shouldn't really speed up anything, unless your internet connection is 56k modem or something like that.

allo- commented 9 years ago

You can send the client an "Expires" Header with the middleware, if you're sure the page won't change. Then it makes no request at all, if you do not reload. On reload it should do if-not-modified-since and send an e-tag, on force-reload it should fetch it completly.

mk-fg commented 9 years ago

You can send the client an "Expires" Header with the middleware

Yeah, indeed, it's awesome, only applies to pages that actually don't change though, which default pagination very much isn't, hence my remarks on "having pages based on either date range or id is cool".

And you don't generally do that with middleware, as you have to know exactly which pages can get that treatment, and better be sure of that.

allo- commented 9 years ago

Yeah, but you can (mis)use the middleware to achieve the correct caching: https://github.com/allo-/django-bingo/blob/master/bingo/views.py#L434

This is not actually a middleware there, but using the middleware code in a view as if the middleware were active.

mk-fg commented 9 years ago

Eh, I see now why you've added ?asfasdf to url and it helped ;)

Guess it might help to abuse such middlewares, indeed, provided clients can tolerate some staleness or are trained to use F5.


And wrt whole:

You can send the client an "Expires" Header with the middleware

Yeah, indeed, it's awesome, only applies to pages that actually don't change though, which default pagination very much isn't And you don't generally do that with middleware, as you have to know exactly which pages can get that treatment, and better be sure of that.

Yeah, but you can (mis)use the middleware to achieve the correct caching

I don't think what you're calling "correct" here is actually correct or just not acheivable if pages do change at random (again, doesn't apply to date-range/id-range pages).

I.e. if there is a possibility that client goes to page at time-X, CacheMiddleware serves them Expires: <random-future-time-as-you-have-no-clue-when-this-page-changes>, then goes there again at time-Y and sees stale page, it's not correct.

And this is what will happen if "page update" (e.g. feedjack update) operation was run between X and Y for default pagination. This "page update" can't magically reach into client's browser and invalidate that cache after "Expires" header that CacheMiddleware sets.

Maybe useful, more performant, less demanding on servers, but not "correct" at all!


You can't do it correctly with whatever magic middleware, unless you or that middleware somehow knows for sure that "this page won't change for X seconds in the future". Django middleware can't see the future, so it doesn't.

In the code you linked, that if game_expired: line will never be executed when the browser shows user stale page due to Expires: header, as there's no request that triggers it.

I mean, it works for server-side caching just fine, don't get me wrong, but as long as we're talking about purely client-side caches (which I was under the impression we were):

allo- commented 9 years ago

I know, what's the difference between client caching and serverside caching ;).

I use feedjack currently with no cache at all. I think some stuff internally should be cached serverside (maybe, depends on the load time), while stuff like maybe some date-based site (long enough in the past, that no site with diverging clock adds new items) may be cached client-side.

In my example you see some client side cache for generated images, which 1) change during a game often, so its only cached for 5 minutes 2) never change afterwards, so its cached a long time. This middleware (mis)use enables the different cache times without much own code.

And for feedjack the best would be to use no client cache at all, i guess. It IS a dynamic site, and you do not re-visit old pages that often. It's not like images, where are hundreds on the main page ;).

allo- commented 9 years ago

Anyway. I posted my current plan above, cache was some point, but only if there are (still) problems. the other point is to get the two forks merged in some compatible way, which possibly means to add what's still needed after your work from our forks (mine should be cato- + some more experimental commits on top) to yours, in a way that it's possible to migrate old databases.

allo- commented 9 years ago

What about making the whole pagination date-based, without looking at the cache first (but while the cache works)

create urls "since/yyyy-mm-dd" as pages, get max(20, until yyyy-mm-dd +1) entries on the page, next page link is "yyyy-mm-dd+1". Maybe instead yyyy-mm-dd-hh-mm and split accordingly, so a page has 20 entries.

The user can then create own yyyy-mm-dd-hh-mm urls, which are generated on the fly or browse via the standard ones, which are cached after the first hit.

mk-fg commented 9 years ago

What about making the whole pagination date-based, without looking at the cache first (but while the cache works)

Not sure if it's really a question. Don't think there's any technical difficulty in making any of the things mentioned above work, and naturally doing them one-by-one is a good way to go, just needs some time and effort put into it.

allo- commented 8 years ago

I finally replaced my very old feedjack with a checkout from your version and started to add lost stuff / make it workforme again.

My fork is https://github.com/allo-/feedjack and i activated issues there as well, as it seems you're not very active at the moment as well. Feel free to pull stuff and if you want to start working on it again we can move to a common project in an "organisation" on github or something like this.

Steps i would like to do:

I really like your update command and wil try the filters, that always were features i wanted to have.

By the way, the migration from a very old version via south and then django migrations made quite a few problems, i guess i will remove the south stuff and add a note at which point in the git it was available. I needed to downgrade to django 1.6, randomize some existing guid-fields (you think they are unique? think again), remove tables and fields for the migrations to recreate them, clean contenttypes, etc..

I would think most people who really want to migrate a very old version can checkout the old version before and others are better off starting with a fresh install.

COLABORATI commented 8 years ago

Hi, I would like to know: will the changes by allo- merged into this repo? If not, this is ok, but then please add some info about where the actual development for this code happens. It woulod be great for others to know which is the current "hot" feedjack repo. Thanks!

allo- commented 8 years ago

Hi. I am activly working on the code currently, but step by step. The result will finally be hard to merge, as i changed some concepts, urls, etc.

Further i plan to make it pep8 compliant after the refactoring, which will make it very hard to merge this branch. I delay it for old code until other refactoring is done, so there is no big patch between the branches, which makes it hard to trace patches. When i do this, there will be a lot of white space change, which may be possible to be filtered with "diff -b", but much will be an rather ugly patch.

Differences (done / planned):

Finally i plan to have Site views, which are just like a planet and User views, which associate a subset of the feeds with a user account, like a newsreader. If there is time and ambition, a a control panel would be nice, where users can suggest new feeds, which can later be subscribed by all Users and Sites and a admin control panel, where the administrator can accept new feeds.

So the data model should be something like:

This means my fork is currently already incompatible with this one, @mk-fg may or may not want to use it later. I started doing that big changes when i heard, that he's not very active here at the moment anymore. I cannot guarantee how long i will stay active on my fork, but i have some plans (see the list above and the issues there)

mk-fg commented 8 years ago

Hi, I would like to know: will the changes by allo- merged into this repo?

As mentioned in mk-fg/feedjack#7 - probably not, this repo is kinda dead.

If not, this is ok, but then please add some info about where the actual development for this code happens.

I'm no authority on the subject (why'd I be?), but at least will add a note to the README about abandonware status, to avoid misleading people like that.

Thanks for suggestion!

It woulod be great for others to know which is the current "hot" feedjack repo.

As I suggested in comment on allo-/feedjack#14, probably best way to do it would be to have one "true" entry for feedjack in a github org repo, like "feedjack/feedjack", which won't be tied to any specific owner, like mk-fg/feedjack unfortunately is.