gravitystorm / blogs.osm.org

The new feed aggregator for OpenStreetMap
https://blogs.openstreetmap.org/
5 stars 16 forks source link

Let's make OpenStreetMap Blogs Better - Ideas / Todos for 2019 #35

Open geraldb opened 6 years ago

geraldb commented 6 years ago

Hello, As some might know I'm the Pluto dev and happy to see it at work here at OpenStreetMap Blogs. Always open to make it better - looking at the pages here are some ideas:

What about generating language-specific pages (e.g. German, Vietnamese, etc.)? All the foreign language posts come from the OpenStreetMap User's Diaries as far as I can tell. If you can help with adding a language (or lang) attribute to the feed that would be great! Otherwise (auto-magic) language dedectation might be an option (more happy with explicit language/lang attributes).

And these (old) issues look good to work on (in 2018):

Any others?

Your suggestions and ideas welcome!

Happy new year and Prosit 2018! from Vienna, Austria. Cheers.

harry-wood commented 6 years ago

In my opinion this issue https://github.com/gravitystorm/blogs.osm.org/issues/17 ...should be a top priority. We shouldn't be hosting spam. But I'm not sure if that's a pluto coding issue or something else. Depends which of suggested approaches we go with.

tomhughes commented 6 years ago

Well yes obviously #17 would be nice to fix, the problem is that there probably isn't any realistic way to fix it at least in general.

About the best I can think of is to remember what IDs you saw last time and then see if any have vanished on the next run but taking care only to consider an ID as vanished if you can still see an older one so that just "dropping off the end" doesn't count.

The alternative would be some hack where we still listed the ID in the feed but with a special flag to say it should be removed.

tomhughes commented 6 years ago

I was looking at adding language details as you suggest by I don't see anything in http://cyber.harvard.edu/rss/rss.html#hrelementsOfLtitemgt that offers a way to tag languages?

It looks like you can tag a channel with a language but not an item :-(

geraldb commented 6 years ago

On the #17 will try to read up on the notes. Don't know the details about account deletion etc. - I can try to "fix" or better add the missing author information from #12 than it may be easier to filter / delete the posts from a single multi-user feed.

As a general idea - black list / white list. Once we have the author info one idea (easy fix) is to add a white list or black list - but of course that's additional admin work (e.g. somebody has to keep the white/black list up-to-date). Sorry if I'm thinking out loud.

geraldb commented 6 years ago

About the language tag:

I'd say by now RSS 2.0 is broken / an orphan - that is, the founding fathers have disappeared and don't care to evolve etc. Anyways, the easy way is to use the allowed extension with namespaces that's quit popular so add to the item atom:lang="de" or atom:lang="en" or such (the irony is of course to import the atom namespace). Welcome to the real world. Will try to make it work with the feedreader if that's workable.

(Update: I'd say the best namespace might actually be "standard" html, thus html:lang="de" or html:lang="en" or such). Might need a litte research / check up what's the most popular way (or "future-proof").

Another option is building a feed for every language. bingo! I think that's the idea of keep it simple. e.g. feed.en.xml, feed.de.xml, and so on. Again sorry for thinking out loud. I don't have a definite answer / fix.

tomhughes commented 6 years ago

Oh you can already get a filtered feed for a given language - see https://www.openstreetmap.org/diary/es/rss for example.

The problem is we'd need to subscribe every possible language separately in the aggregator ;-)

geraldb commented 6 years ago

That would be fantastic. How many languages do you have?

I can add the missing piece - that is, in the planet.ini you can add / set / configure the lang / language as a new key / value pair (it defaults to en), for example.

If you know the language codes I can give you a short five line script (in ruby) that auto-builds the config entries or something (that you can than cut-n-paste) e.g. than you "just" need to admin the language codes.

tomhughes commented 6 years ago

One other thing I'd quite like is to be able to specify a category in the ini file and have it only include posts which have that category?

geraldb commented 6 years ago

Sorry can you explain to make sure I get it right? What category? The category that might be in the feed item?

What you can do now - is use filters in your ini (but these are "full-text" regex search filters) I think you can use exclude and/or include (have to look it up).

tomhughes commented 6 years ago

Well I believe atom can have one or more <category term="xxx" /> elements on each post and I'd like to restrict it to posts with a particular category.

So eg jekyll-feed will add that for each tag you apply to the post so I can limit pluto to only including posts from my blog with an openstreetmap tag.

geraldb commented 6 years ago

I see. Thanks for the update. I added categories (tags) in the feedparser last year, see https://github.com/feedparser/feedparser/blob/master/lib/feedparser/builder/atom.rb#L130

So that shouldn't be too hard. The hard part is to find an "inituitive" setting for the ini. Maybe:

tags: openstreetmap # you can use one or more (if the have no spaces otherwise we need to use commas or something. Added it to the todo list.