Allow station edits somehow again.

Some stations need edits, for example, the icons and title of "Kulturradio". How can I update that? http://www.radio-browser.info/gui/#!/history/960c1c0e-0601-11e8-ae97-52543be04c81

Should become "RBB Kulturradio"
Icon is not shown in the app

@segler-alex

Using AI to determine the format/tags of a particular radio station has its problems. For instance in the US/Canada a particular radio station might normally play a particular genre of music most of the time. But, it might also carry live play by play sports broadcasts of a local sports team. So, if the AI bot is listening when the sports broadcast is on, it might determine that it is a "sports" radio station when in fact it's a mostly a station dedicated to a particular music genre.

So that kind of thing will require human intervention for accuracy.

But @segler-alex you are free to work on whatever it is that you want to work on.

However, given that there is a great deal of interest in this particular project from many people in many parts of the world, what is problematic are very long periods of time when those of us who have been following this project don't know what you are up to or what your are thinking/working on.

So it would be extremely helpful if you could at least pop in maybe once every month or two to let us all know what's up with things. That way the rest of us can move on as well.

Using AI to determine the format/tags of a particular radio station has its problems. For instance in the US/Canada a particular radio station might normally play a particular genre of music most of the time. But, it might also carry live play by play sports broadcasts of a local sports team.

That's definitely a problem. It's similar in Austria and Germany, where many radio stations have country or rock evenings during the workdays and long soccer broadcasts on the weekend. That can't be nailed down by letting an AI listen to 30 seconds of the stream. That's why I suggested to take only metadata samples (specifically the StreamTitle tag) in like every 5 minutes for 24h. Then you have a cross section of what's played. This could be looked up by formulating a question to ChatGPT or a song/artist database, in case there is one that tags artists or songs by genre and is free to use.

Aside from that: I don't think that the radio browser database currently leaves much left to desire in terms of missing stations. I'd say it features almost every web radio station there is. The problem is the quality of the existing database, before thinking about "how could I crawl & AI new pages by only letting the user submit the URL of the website"

Aside from that: I don't think that the radio browser database currently leaves much left to desire in terms of missing stations. I'd say it features almost every web radio station there is. The problem is the quality of the existing database, before thinking about "how could I crawl & AI new pages by only letting the user submit the URL of the website"

There is stuff that is indeed missing, but mostly it's links that were there at one time and broke some years ago when a station changed their streaming URL mostly because they changed streaming media providers and or streaming media formats.

I get that we developers don't like the human aspect ;) and think we can automate everything and that IA is very exciting but that won't solve the database gradual degradation IMHO.

But if you can do it ... great :)

Currently I get a lot of request for edits (mostly logos and stream urls, also a lot of countries are wrong) and have to maintain my own data that I apply over radio-browser data, it seems a bit of wasted efforts to be honest as it does not benefit the whole community.

I get that we developers don't like the human aspect ;) and think we can automate everything and that IA is very exciting but that won't solve the database gradual degradation IMHO.

But if you can do it ... great :)

Currently I get a lot of request for edits (mostly logos and stream urls, also a lot of countries are wrong) and have to maintain my own data that I apply over radio-browser data, it seems a bit of wasted efforts to be honest as it does not benefit the whole community.

If automation can work, that is indeed great. But, in the meantime we are collectively "stuck" with a human created database that has problems, that could be fixed using many of the ideas that have been presented by people here over the last few years.

I am thinking, that perhaps there is at least one person on Planet Earth that @segler-alex trusts who could take over the "human editing" side of things on an interim basis until such time as either a)the new AI system "works" or b) a new "human system" works.

or a song/artist database, in case there is one that tags artists or songs by genre and is free to use.

@terr72 MusicBrainz provides an API, but at first glance it looks like it might require two calls to get the genre (one to lookup the MBID, then another to get the genre for that MBID.)

Very few of the stations I listen to involve music and several involve minority languages. I have little faith in AI being useful in this area.

I’m certainly not against automation (it makes everyone’s life easier), but you don’t have to completely disallow human imput to achieve automation.

@segler-alex is there some short term station edit solution you can help support to improve the database, while we wait for this longer term vision?

Recently discovered Radio Browser, since Radio? Sure! seems to have died (community editable, daily-published dataset for download, many thousands of stations of various languages/genres around the world).

Seems something of a dilemma, if editing is desired but accountability (eg authentication) isn't.

Might I suggest a voting system. Compare how MusicBrainz does things.

It could be possible to have voting without username+password credentials; have the backend try to figure out if votes come from distinct sources (rather than the same person (or bot) multi-voting).

network characteristics: a simple way would be by IP address, but that's easy to circumvent; instead, I suggest by net-block or even AS-number (roughly correlates to ISPs, sort of)
to counter bots, examine the behaviour based on network traffic; >100 votes per second? Likely a bot. <1 vote per second? More likely to be human (or a slow bot, which limits the risk)
could examine the set of HTTP request headers sent by the HTTP UA; consider EFF's Panopticlick as inspiration for ‘profiling’ of browser ‘fingerprints’; though, a non-trivial bot would be able to fake this by simply generating pseudo-random sets of headers per-request
you'll likely want to exclude (for voting, but not read-only access) the likes of
- Tor
- various proprietary VPNs
- public HTTP proxies

In the end, I think a choice must be made over the inherent dilemma/conundrum;

do you want low-barrier-to-editing, but needing more vigilance over malicious edits? Anywhere that allows guest (pseudo-anonymous) editing (eg Wikipedia) faces the problem of frequently having to clean up vandalism (and spam, if one chooses to consider them as separate) — this only works at all, with a large user-base. Even then, WP still has to edit-protect many pages (eg templates which are transcluded in many other places, documentation, and other obvious targets)
do you want a high-quality dataset, with a somewhat higher barrier-to-entry (fewer editors, but more enthusiastic)? — this is the model which many others have adopted; OSM, MusicBrainz, even blogs which accept comments (I recall reading an analysis, once, that by slightly increasing the barrier-to-comment, that there was much less spam and the quality/calibre of comments increased

There is likely no magic solution; only differing trade-offs.

Some of the benefits to allowing editing (as opposed to new entries for each revision):

dataset is much cleaner (community effort)
easy to examine the history of changes
some audio-stream(er)s frequently change their URLs (the actual octet-stream of audio)
some stations cease transmission, and thus need some way of being marked as defunct

Another way, inspired by a few other repos (of public instances of particular software, eg Searx), would be to host the actual database as a git repo, which people can submit pull-requests to, for changes. Thus, you're abstracting/outsourcing the authentication/verification problem to a 3rd-party (whoever hosts the repo; GH, GL, CB, or whoever).

P.S.; seems that others already proposed this: see #95.

Yet another idea; offer multiple datasets.

One being highly-curated, released occasionally (weekly/monthly), for those wanting accuracy/reliability over timeliness
One being frequently-published (daily/hourly), but with the caveat that it hasn't been scrutinised as much

This way, users can choose which they prefer, rather than the host trying to offer a one-size-fits-all dataset.

@kekukui

2. Registration will require a CAPTCHA to prevent a malicious user from creating many accounts.

This is a horribly flawed approach, for many reasons. I'm very much reminded of an articulate dissection I read (entitled: Fuck CAPTCHA), ~~but can't seem to find it again 😒~~. Another rebuttal, instead: My definitive guide to why CAPTCHA sucks.

There's a whole slew of usability and accessibility problems with CAPTCHAs. If this project is user-focused, and intends to be inclusive, then that's completely at odds with using the horror that is CAPTCHA. Don't assume that everyone uses a graphical (visual-rendering) browser; they don't. Learn about screen-readers. For reading material, a quick Web-index query (for “captcha accessibility”) yielded:

CAPTCHA if you can: is it accessible?
AI is making CAPTCHA increasingly cruel for disabled users
Apple are helping eradicate the curse of CAPTCHA
Inaccessibility of CAPTCHA - World Wide Web Consortium (W3C) (yes, the organisation which authors various Web standards which we're all using)
Ah, but there's an ‘accessible’ audio version, you say? Take the AbilityNet CAPTCHA challenge
all of this also assumes that the user has an ECMA-Script (‘JavaScript’) executing user-agent
You (probably) don't need ReCAPTCHA — thanks to Porkepix's comment — given the user-hostility described, I feel obliged to link to Stallman's Reasons to not use Google

There are more fundamental problems;

security: address the actual root problem, rather than how it manifests. It shouldn't matter if an edit comes from a human or a script; what matters is if that edit is of quality or not. Humans can just as easily make bad edits (typos, for example). — address this in a way which is invisible to non-malicious UAs, just as how if you're not trying to defraud an insurer, then you never hear from their fraud investigators (rather than the insurer assuming that everyone is a fraudster (possibly until proven otherwise, until next month)); address it at the network level, too. Lots of clearly-malicious edits coming from the same subnet? Then throttle that subnet, while leaving everyone else alone.
CAPTCHA is merely one way to achieve the same step (not a goal; the goal is to allow good edits while denying bad ones); could, instead, use a simple Web-form to ask a natural-language question which should be trivial for humans, but difficult for bots; something that requires interpretation, or relies on context (both of which bots suck at)
as per the name (CAPTCHA = Completely Automated Process to Tell Computers and Humans Apart), using CAPTCHA won't actually prevent a human from making many accounts (but, once again, back to examining the network traffic, and more generally sanitising inputs; lots of dubious requests coming from the same network? apply restrictions to that network, not everyone)
CAPTCHAs frequently fail, in multiple ways (both false-negatives (allowing spam) and false-positives (blocking humans))
using a CAPTCHA, you're announcing to the world that you have a spam problem and don't know how to deal with it; that's pretty lame, in terms of PR
don't assume that the likes of CAPTCHA stops malice; it's oft been reported that a simple circumvention is to employ low-wage labour (that is, humans) to do your nefarious bidding for you; they'll solve CAPTCHAs all day so long as they're paid
CAPTCHAs become even less effective as the capability of bots increases (so-called ‘AI’ (better called machine-learning), anyone?)
requiring CAPTCHA at account-registration-time doesn't prevent abuse one the account is in use, and one account can still do a lot of damage — back to differentiating good vs bad edits (as opposed to trying to profile users)
focusing on specific accounts (‘just ban him!’) then simply changes the game into one of whack-a-mole over account attrition — again, because of failing to focus on the real problem (which, I accept, is a difficult one; but attempting to evade it definitely won't solve it), and examining the incoming requests on their own merits
there's a more general problem, with the proposed trust model; being a trusted person is then a vulnerability to DB integrity; I've seen it happen in other projects, where a malicious individual posed as someone wanting to help the project, put in the grunt-work, ended up getting admin privilidges, and immeditaley deleted as much as he possibly could — a clearer way of looking at ‘trust’ (from, if I recall, Bruce Schneier) is that a trusted entity is one which can break your security policy. To those unfamiliar, this initially doesn't make sense; why trust something/someone which can do bad things? Surely only trust honourable creatures? Well, that's the point; by trusting them, you're inherently putting them in a position to (in this case) trash the dataset. That can happen accidentally/mistakenly, aside from maliciously. Besides, how are such super-editors selected/vetted, and by whom? Having such accounts, at all, makes their credentials valuable (and thus vulnerable) to would-be vandals. This is avoided, with other approaches (especially those which don't simply shift the problem around, but actually tackle/neutralise it, including possible attack-vectors; moving the gate to the other side of the castle doesn't really help much, since the attacking army will go there (to attack the new gate, instead of the new wall) instead; moving the problem simply causes an attacker to modify their tactics; what you/we want is to force them to have to adopt a different (less effective) strategy (in the castle analogy, the ideal would be to have no gate or other weak-point, else to make it extremely well defended (such that attacking elsewhere than the gate makes sense to the attacker, even if that'd be more troublesome an approach for them (eg, trying to smash through a wall)))).
and just to nuke it from orbit, to be sure: CAPTCHA Be Gone: “CAPTCHA Be Gone solves this problem by securely detecting captchas on webpages, solving them, and copying the result to the user's clipboard in a matter of seconds with the press of a single keystroke. It is not even necessary for the user to know the precise location of the captcha. No data, other than the captcha, is sent out, so the user's personal information, websites, and other identifying data is secure.” But, wait, it gets better! “For $3.50 a month (only $3.00 for those who sign up during the introductory period), CAPTCHA Be Gone will solve an unlimited number of CAPTCHAs.”
oh, wait, it's still not dead/defeated (or $3 a year is too high a price)? Let's bring out the Death Star (source code, as a git repo, for anyone to run locally, which defeats reCAPTCHA), then:
- github.com/ecthros/uncaptcha — “Defeating Google's audio reCaptcha with 85% accuracy.”
- github.com/ecthros/uncaptcha2 — “defeating the latest version of ReCaptcha with 91% accuracy”
- All your CAPTCHAs are belong to … script-kiddies! {Evil laughter}
if you're concerned about commercial competitors trying to sabotage you; well, then they'll have the resources to hire low-wage labour to solve CAPTCHAs for them. When your adversary is a human, how's CAPTCHA gonna help you then? Examples:
- 2Captcha: Captcha Solving Service — you can even sign-up to be one of their CAPTCHA-solving monkeys, for BitCoin rewards … well, at least until they figure out how to get software to do it for them, faster & cheaper than humans
- Death by Captcha: Best CAPTCHA Solver Service
Ah, but what about the normies, I hear you ask. They don't know how to do any of this stuff (compile source-code? What's that mean?!). OK, well then we've got Buster: Captcha solver extension for humans, available for Chrome, Edge and Firefox. Ready to face-palm in despair, yet? 😈😀😂👍

In the case of the threat-model described (a lone vandal attempting to make many accounts), a much better approach would be to throttle (only him) at the network level (while leaving all other non-vandals unaffected). If he persists, then report him to his ISP (a good tool for determining who that is, from the likes of IP address, is HE's BGP info portal). If the ISP turns out to be uncooperative (they've given him a pink-contract, or simply don't care), then throttle the entire ISP (again, at te network level), or forbid edits from its network (but permit read-only access).

The benefits of focusing on the nature of the submitted changes, rather than where they come from, include:

you don't have to worry about abuse of VPNs and other source-masking mechanisms
you catch non-malicious, but questionable, edits (typos, or otherwise controversial)
you avoid playing whack-a-mole at various different levels
when the focus is on evaluating changesets, then anyone can do that (unlike iptables configuration), and can be a community collaborative effort (see above re voting)
avoiding the problem of moderators, since then they become the bottleneck, and are vulnerable (both themselves, and their credentials) to bad actors (social engineering becomes an attack-vector, in that case); this is very much putting too many eggs in one basket, instead of an approach which favours distributed consensus — compare how BOINC projects check that results from volunteer computers are valid; by having a minimum quorum of required agreement; the same work-unit is sent to N computers (let's say 3), and at least a majority (let's say 2, in this case) must concur for the result to be accepted into the project's master science DB; some projects use rather larger numbers, depending on their needs (both how many hosts are given the same work, and what percentage of them must agree for a result to be accepted). Notice the absolute lack of any notion about ‘trusted’ computers/volunteers.

Complaints about invalid stations will employ a CAPTCHA for rate limiting, to prevent abuse of the complaint mechanism.

Why? If you want to rate-limit, then you certainly don't need CAPTCHA. That makes no sense, and is using the wrong tool to solve the wrong problem (assuming that CAPTCHA solves any problem).

Rate-limiting is better addressed at the network, or hosting level. Start by merging #182 !

Interesting, this suggestion of CAPTCHA for rate-limiting; this strongly implies that it's not about distinguishing bots from humans at all, but about deliberately causing user-inconvenience (to put it nicely). That's pretty terrible design. User-hostile, even.

It's difficult to maintain a dataset when there isn't much of a community willing to jump through the artificially-imposed hurdles. Especially if it's Google's reCAPTCHA; you're then requiring anyone who wants to contribute, to agree to contractual terms with Google, which permit it to do whatever it pleases (even if they don't have (or want) a Google account, or otherwise to be subject to Google's privacy-invasion & data-hoarding).

Savvy users (rightly) interpret this as developers telling them to ‘fuck off’. OK, well, if you don't want users, why bother developing/hosting the project in the first place?

I find it revealing that little thought has gone into considering the implications, with CAPTCHA invoked as some kind of anti-spam incantation, hand-wavingly dismissing the problem (deemed solved, somehow). I'm reminded of a satellite radio company which failed; it documents it cited lack of customers as one of the primary problems. Yet, based on the article I was reading about this, the author explained the extreme difficulty he had in becoming a customer (read: giving them money), in significant part because of terrible design of their Webs[h]ite. Subjugating users doesn't end well. Even Microsoft is learning this inevitable lesson.

Be sure that the cure you propose to an ailment isn't worse than the disease itself. Don't become the monster you fight. We have enough of that (and other heavy-handedness) from government, already. Please don't add to it. It's only too easy to do, especially incrementally.

You can add https://nearcyan.com/you-probably-dont-need-recaptcha/ to your resources, focused on the most used one, and also consider the fact that one and every similar ones are illegal in Europe thanks to GDPR: they collect data besides just the security feature for complete other reasons and that's problematic.

For comic relief (because sanity-preservation demands it); if the goal is to vex users plenty, then one could always:

have them choose a password which complies with the requirements+rules of Password Game — even scammers give up on it! (yes, I'm aware of the irony of my linking to a GoogleTube video whlie bashing Google for it's CAPTCHA-surveillance antics; I don't choose where Kitboga posts his exploits, and would recommend using YouTube-DL and whatever your favourite locally-hosted media player is to access it, since it's rather hilarious to watch)
have them solve (all levels of) the NotPron riddle

@Vrihub

we should just set up a new community-managed station database

I've been thinking very similarly, especially (as I mentioned) since Radio? Sure! has died.

I have a bunch of notes on the matter. I was thinking along lines that it should be equivalent to CDDB (which I'm using as a general catch-all term to also include FreeDB, and now GnuDB).

I tire of all the hurdles (to find/acquire stream metadata), and wheel-reinventing (project/DB proliferation), as each player tries to become the central monopoly. Even Radio Browser seems to have removed the possibility of downloading the dataset/database for offline use (based on (a now-defunct link) the post in which I learned of Radio Browser), and prefers to focus on collecting usage metadata instead.

There are disparate lists, all over, but no unified DB which encapsulates them all.

Unfortunately, I'm not in a position to start such a project, for the foreseeable future (life's complicated). However, do you know of anywhere else that such an idea is being discussed? I care more about the principles behind it (community, libre), rather than who runs it, so would be willing to contribute ideas. I also have a somewhat-outdated (again, life's complicated) but non-ancient dataset from Radio? Sure! on storage which I possess, if that'd help to seed a new project.

Honestly/bluntly, this should be a solved problem, by now. So, it's long overdue.

@wolterhv

Firstly, I agree with the sentiment of your overall post

OSM is a far larger project and has a collaborative model. I use it quite often and I have yet to see vandalous edits.

However, as a fellow OSM editor (on-foot surveyor; the only one in my area; when circumstances permit (life's complicated)), I have a few remarks.

I've seen occasional (minor) vandalism. Or, at least, what seemed to be vandalism. I suppose it could've been a gross error by a newbie, who didn't grok the implications of what he was doing.

That leads into my more general point; while malicious changesets are indeed a problem to be concerned about, the far more common case is dubious changes made non-maliciously, especially by newbies who are unfamiliar (with either OSM, or the concepts underpinning it, such as why they shouldn't tag for the renderer and why the data and renderings are separate (akin to HTML versus CSS)).

So, I'd argue that the whole ‘identify the bad editors’ is folly, and futile. Everyone makes mistakes (typos), so a more generalised system, by consensus, which focuses on accepting quality changes (and rejecting dubious ones), covers a multitude of cases (regardless of how they happen).

Instead of a DB host having to re-invent the wheel (ie how to deal with each of these problems), offloading that to a system which has already had to tackle them, and has mechanisms for reviewing changes, and so on, seems wise. Compare using a library of someone else's code; that's then abstracted away for you, and in a sense not your problem (fixes should be applied upstream).

The current hold-up is that the dataset is in a binary database, instead of primarily textual (from which a DB could be generated/updated). Text enables all sorts of benefits. See my comment in #95.

Much more generally, now;

In this model, radio-browser.info would be one of the databases a user can pick, and radio-browser.info clients can let the user select which database they want their radio stations from.

Either that, or (based on comments here) radio-browser being a front-end to a DB hosted elsewhere (because conflict of interest, otherwise).

I'd argue that the design/architecture of the dataset (what metadata is held, in what format, etc.) should be done openly, from scratch. Compare MusicBrainz to CDDB. The former is a whole lot more capable than the latter, likely due to lessons learned from the limitations of CDDB. Plus, the general principle of future extensibility (since future needs are as yet unknown).

Moreover, my ultimate point, here, is that I'm the type who doesn't want to use an ‘app’ at all. I want the dataset/database itself. This is why I was delighted to discover Radio? Sure!, and sad when it shutdown. RadioSure was a basic version (CDDB level, rather than full-blown MusicBrainz; but it worked) of what we're discussing here;

community-maintained
several dozen thousand entries of many languages & genres from around the world — I'd have to check, but I think it was approaching 100,000 entries
queries & edits could be performed via a Web-UI (not to say that I'm anti-API; APIs are sensible)
the entire dataset could be downloaded, easily (via a static URL), which was republished daily. Compressed, it was only a few MiB (even uncompressed, it wasn't exactly large, for what it contained), which was a TSV file (thus, using grep & friends was a no-brainer).

So, I had scripts to fetch & query the dataset, locally, offline. I then used a local media player to actually play the streams of my choosing. Nothing more to it.

So, while I prefer a non-app approach, myself, if others do then that's up to them. The dataset should be libre for anyone to use. My point is simply that those come as a result of it being readily available (eg for my use-case). Folks are then free to use it as they wish; in an app, or however else.

Compare OSM; you can download world.osm and host it yourself, if you so choose. You don't have to interact with OSM at all (other than to fetch the dataset). You can even host your own instance of overpass by fetching the minutely diffs. Querying OSM APIs is a possibility in addition to this, not instead of it.

One of the problems with API-only access is that as demand increases, so must the capacity of the hosting. Hence why, for growing datasets, offering downloads is often more efficient for both parties.

So, for me, unless Radio Browser is gonna become libre in any meaningful sense (publishing the DB/dataset in a machine-readable format, without proprietary encumberments, instead of conducting pervasive survelliance on users), then the DB/dataset must be hosted elsewhere (and regenerated, if really needs be) leaving Radio Browser to be a front-end (Website, app).

Ultimately, the dataset is the crucial part (hence why that's the part which hosts are most hoarding/miserly over).

The problem, if Radio Browser is centralised upon, in its current state, is the same as for RadioSure; what happens if it gets into difficulty later, or its lone developer doesn't have the time (like now, when he's off developing his app, while neglecting the dataset which the app relies on). If the dataset is locked away behind some API, then it could easily be lost, should the site ever go offline (not only am I thinking of RadioSure, here, but also FreeDB more recently). The model which RadioSure seemed to use was that their app was the commercial focus. It required a dataset of stations to drive it, of course, and I suspect that the thinking was to exploit the community to do the maintenance work for them (hence having that part be open). This didn't pan out for them. As soon as the money stopped rolling in, it all disappeared; dataset and all. At least FreeDB made the effort to ensure that archive copies of the last dataset revision were published for others. Sadly, GnuDB seems to not allow downloading of its dataset(!)

My concern, generally, is that unless such fundamentals are fixed, then we'll keep going in circles, re-inventing the wheel.

@Vrihub

we should just set up a new community-managed station database

I've been thinking very similarly, especially (as I mentioned) since Radio? Sure! has died.

I have a bunch of notes on the matter. I was thinking along lines that it should be equivalent to CDDB (which I'm using as a general catch-all term to also include FreeDB, and now GnuDB).

I tire of all the hurdles (to find/acquire stream metadata), and wheel-reinventing (project/DB proliferation), as each player tries to become the central monopoly. Even Radio Browser seems to have removed the possibility of downloading the dataset/database for offline use (based on (a now-defunct link) the post in which I learned of Radio Browser), and prefers to focus on collecting usage metadata instead.

There are disparate lists, all over, but no unified DB which encapsulates them all.

Unfortunately, I'm not in a position to start such a project, for the foreseeable future (life's complicated). However, do you know of anywhere else that such an idea is being discussed? I care more about the principles behind it (community, libre), rather than who runs it, so would be willing to contribute ideas. I also have a somewhat-outdated (again, life's complicated) but non-ancient dataset from Radio? Sure! on storage which I possess, if that'd help to seed a new project.

Honestly/bluntly, this should be a solved problem, by now. So, it's long overdue.

I want to clarify that I did NOT remove the posibility to download the entire database.

https://backups.radio-browser.info/ backups of the last months, daily, full, sql
http://de1.api.radio-browser.info/json/stations full current download in json format, also available in xml

There is NO hidden data. NO data collection. This a hobby project for me. It does not generate money in anyway. I loose money with it. I do it because i like programming. and every hobby costs money. If you want to download the current database in any format, feel free to do it, and use it to kickstart your own database. That was my initial idea, why i started this project to have something completely open and opensource. the software with all libraries i created for it is. https://www.radio-browser.info/faq -> project components the project structure even tries to mirror the decentralized server approach of email, so anybody can run the software and connect to the network. or write something that is compatible to the json format. I hope i could clarify things.

I would say the easiest way to start a new database with all features is to put some xml files on a gitlab project and allow people to do merge requests. then let people download the files from there. Add gitlab pages to the mix to publish the xml files on every merge and you already have a completely open text file way of sharing and managing radio lists.

I did NOT remove the pos[s]ibility to download the entire database

Ah! Excellent. In that case, I withdraw my assertions to the contrary.

Might I suggest

an HTTP redirect from the old download URL, to perhaps a landing page giving info about from where (and basics of how) to download data (either partial, or the complete dataset).
~~Perhaps also a somewhat more prominent pointer, to such info, from the main www.radio-browser.info site.~~ Nevermind; should've checked the FAQ 🤦‍♂️.

Radio-Browser is now immediately much more interesting+promising, to me (in place of the former Radio? Sure!), so long as this trend continues. 👍

There is […] NO data collection.

From API documentation, under HowTo § use it directly § [№ 3] Remember the following things § [2nd bullet point]:

Send /json/url requests for every click the user makes, this helps to mark stations as popular and makes the database more usefull to other people.

Sounds exactly like collection of user (behavioural) data, to me. Covert (non-voluntary) collection, at that.

the project structure even tries to mirror the decentralized server approach of email, so anybody can run the software and connect to the network

Interesting. Good.

I'd be curious about hosting a mirror/node, in future, in that case.

I did NOT remove the pos[s]ibility to download the entire database

Ah! Excellent. In that case, I withdraw my assertions to the contrary.

Might I suggest
* an HTTP redirect from the old download URL, to perhaps a landing page giving info about from where (and basics of how) to download data (either partial, or the complete dataset).
Sorry, you have me at a loss, please explain what you mean with "the old download url". I do not remember any other ones as the ones I gave you.
* ~Perhaps also a somewhat more prominent pointer, to such info, from the main `www.radio-browser.info` site.~ Nevermind; should've checked the [FAQ](//www.radio-browser.info/faq) 🤦‍♂️.
Radio-Browser is now immediately much more interesting+promising, to me (in place of the former Radio? Sure!), so long as this trend continues. 👍

There is […] NO data collection.

From API documentation, under HowTo § use it directly § [№ 3] Remember the following things § [2nd bullet point]:

Send /json/url requests for every click the user makes, this helps to mark stations as popular and makes the database more usefull to other people.

Sounds exactly like collection of user (behavioural) data, to me. Covert (non-voluntary) collection, at that.

the project structure even tries to mirror the decentralized server approach of email, so anybody can run the software and connect to the network

Interesting. Good.

I'd be curious about hosting a mirror/node, in future, in that case.

The API endpoint you are refering to is optional, you do not have to use it if you do not want to. It also does not save per user data, it just adds to the clicks of the station so that all people know which stations are well liked. I never thought this could be seen as collecting behavioural data in a negative way. But I will think about adding a feature to the Android app to allow the user to opt-out of sending this. For your own api uses of course you already can decide to not send it. I created a ticket for it here: https://github.com/segler-alex/RadioDroid/issues/1176

Thank you, that was the intention of programming it this way, so that anybody can add to the network, maybe also only for themselves and their own computer network mirroring. but of course also for other people if you want to.

Might I suggest

an HTTP redirect from the old download URL, to perhaps a landing page giving info about from where (and basics of how) to download data (either partial, or the complete dataset).

Sorry, you have me at a loss, please explain what you mean with "the old download url". I do not remember any other ones as the ones I gave you.

At 2023-09-23T19:50Z segler-alex posted (quoting me (LHC)):

[…] I tire of all the hurdles (to find/acquire stream metadata), and wheel-reinventing (project/DB proliferation), as each player tries to become the central monopoly. Even Radio Browser seems to have removed the possibility of downloading the dataset/database for offline use (based on (a now-defunct link) the post in which I learned of Radio Browser), and prefers to focus on collecting usage metadata instead. […]

In which I supplied a link using the now-defunct download URL, and the source from whence I got said URL in the first place (in case that info is erroneous).

There is […] NO data collection.

From API documentation, under HowTo § use it directly § [№ 3] Remember the following things § [2nd bullet point]:

Send /json/url requests for every click the user makes, this helps to mark stations as popular and makes the database more usefull to other people.

Sounds exactly like collection of user (behavioural) data, to me. Covert (non-voluntary) collection, at that.

The API endpoint you are refer[r]ing to is optional, you do not have to use it if you do not want to. It also does not save per user data, it just adds to the clicks of the station so that all people know which stations are well liked. I never thought this could be seen as collecting behavioural data in a negative way. But I will think about adding a feature to the Android app to allow the user to opt-out of sending this. For your own api uses of course you already can decide to not send it. I created a ticket for it here: segler-alex/RadioDroid#1176

Multiple points, here;

that the reporting end-point is optional, is good 👍 — however, this seems to be down to developers if they implement+use it or not. That's distinct from the user being informed about it. I comment later, about opt-out/opt-in. — Post-script (while proof-reading); I thought of a better way of articulating this; it's optional for developers, which is quite different from it being optional for users. One does not (inherently) necessitate the other. Ultimately, it's not developer data you're gathering, but user-data; the developer is a middle-man.
if it really is optional, then the API documentation should make this very clear, and
- not recommend it
- not suggest that it's ‘good practice’ or similar
- not instruct developers that they should implement it
- insist that any implementation remain user-respecting
- consider making certain ethical requirements part of the terms of use of the API, if such things (openness, federation, etc.) are indeed goals of this project
- require that users at least be informed (especially if opt-out is used)
- strongly recommend (if implementing the reporting mechanism) opt-in, rather than opt-out, for users
that per-user data isn't saved is good, but it's still transmitted over the 'net (in near-realtime), which poses risk, since Snowden (et al.) gifted us the evidence that powerful entities are performing mass-surveillance upon all of us. Plus, anyone else along the path between user and your API server, may capture the packets. GCHQ does a ‘full take’ (unlike the NSA), and keeps packet captures (rather than only the L7 payload). Unless users know to use a VPN, then the likes of their ISP can (and many do, especially in USoA) capture metadata about what their users are accessing. Even if they're using HTTP+TLS for every connection, TLS still reveals server_name, and even if it didn't then DNS queries certainly do. So, while the ISP may not be able to see what you're accessing within a site/host, they can see which hosts you're connecting to (and when, for how long (what duration), how often, how much data you download/upload, and more).
while your intentions may be benign, that the mechanism exists poses risk. Many savvy users have observed all too often how benign/good intentions become corrupted when incentives/interests conflict (especially when there's money to be had)
some examples (via Stallman, no less) of why good-intentioned data-collection is inherently risky;
- during WW2, when the Nazis invaded France, one of the first things they did was go to the telephone exchanges and seize the records of who had been calling who (and how often); this was used (contrary to the original purpose) to determine who the SS should focus their attention on, to find Resistance members and others deemed ‘undesirable’ (Jews, for example)
- pre-WW2, a European country (I forget which, now) kept records of which religion each of its citizens adhered to (I gather that it was a multi-religion nation, so there was no obvious default). The original purpose was so that upon death, they could be buried in accordance with the customs & rituals of their respective religion. However, again when the Nazis invaded, those records were seized. One of the religions was Judaism (another Christianity, and I forget the third). You can guess what the Nazis did with the list of names of who was Jewish.
so, the data existing, at all, is a Moral Hazard, and enables (unintended) misuse by various entities. The rule of unintended consequences.
from a security perspective; just as servers shouldn't trust clients, especially these days clients should also not trust servers (which are controlled by others, rather than themselves)
as mentioned above, the first step to doing things in a transparent, ethical way, is ensuring that users are informed in the first place
as for disabling reporting; an opt-out is required at minimum. However, this still defaults to sending data which the user may be unaware of. The ethical mode is opt-in; thus, those who really do want to supply such data can do so, and are much more aware (by choosing to enable it) of it, than with an opt-out.
I appreciate that you want to make the dataset more useful, but there are other considerations & trade-offs involved; some of which are externalities (others bear the cost, for someone else to benefit; pollution is the classic example of an externality). However, people already know which stations they're interested in. Popularity is only useful if people don't already have criteria for finding new streams. Popularity feeds upon itself, too; the popular streams get more exposure/attention, which then makes them even more popular, and round it goes in a positive-feedback loop.
More generally, I think your approach is flawed; you're conflating the popularity of a station being browsed, with popularity of actually listening to it (more than briefly). Some (many?) users will browse entries which they dismiss/reject, or even if they do sample the audio, it may only be brief if it's not to their liking. — Instead, I suggest (and I stress heavily that this, in particular, should be opt-in only, absolutely not enabled by default) using a system akin to Last.FM (VLC, for example, can (optionally) report “played songs” to a user-configurable host, which implement's the Last.FM protocol/format; I've never used it, myself, and it's kept disabled in the default configuration; hell, VLC even asks, on first-use, consent to fetch metadata over the network). I'm not saying that the Last.FM approach is best; consider it an example of the general concept; feel free to devise your own scheme (though, if sticking with the Last.FM format/protocol/scheme, I would consider contributing my own listening metadata, so long as it's all done ethically (especially remaining opt-in, not opt-out); it's akin to auto-voting in a sense). This would yield (from willing volunteers) actual listening data, which will be rather more accurate to what you're wanting to measure (consider that, so long as the stream URL continues to serve audio, that they've little need to return to the directory listing, even if they're listening a lot; only if the URL changes in future (as some stations have a bad habit of doing)). Compare Websites conflating HTTPd access-log ‘hits’ (which are actually cache-misses, if you think about it) with popularity in various ways (and many even cripple their site (caching, efficiency) to get more metrics, which is never what HTTPd access-logs were for (properly, they were for monitoring server-load, and (technical) performance, by system administrators (most definitely not the marketing/PR/advertising droids)))
listening/browsing data isn't the only way of getting a sense of popularity; could always use a voting system; it needn't be complicated, either. Compare GH's ‘Star’ mechanism for repos.
there's probably yet more factors/considerations/risks that I can't even think of, at the moment (it's late evening, here, and I'm sleep-deprived)
given the implications, I urge more thinking about the likes of security/privacy/anonymity; this comes with trying to run a non-trivial project, as has been learned with attempts to sabotage/vandalise the database/dataset, already
especially since you seem to be in Europe, then you might wanna think about GDPR implications, while you're at it — to my (non-expert, but non-clueless) understanding of Data Protection, even if you're not storing user-data, you're still collecting+processing it. One might also argue that anything short of opt-in is against the requirement to obtain (informed, freely-given/uncoerced) consent. I'm not a lawyer (and most definitely not your lawyer), and so this isn't legal advice (so, you might wanna get some from a licensed attorney), but it might be that such colection+processing means that you're obligated to register as a Data Controller. If your servers are in a different nation/jurisdiction to yourself, then you definitely need to talk to a competent lawyer (with relevant experience; they have different specialties, akin to how medics do) to figure out which (set of) rules apply. Until you have legal clarity, the safest option is to cease collecting+processing user-data (especially covertly). Ensure that you have an updated privacy policy on the Website, too.
there are smart ways to ethically collect anonymous usage data (within limits); consider how the Tor went about gathering usage data of exit-nodes (eg, which nation the circuit came from). One technique was to not require precise counts; counters are only incremented in blocks/batches of 4, so that there's always some uncertainty over the exact count (because that's not needed, and rough metrics are better than none). Tor actually collects a fair bit of network-usage data; peruse metrics.torproject.org. All of it is done in ways that preserve the anonymity of users, though. That's my point, here.
Also consider how the various plague (SARS-CoV-2) tracking/exposure apps worked; there was no centralised reporting (because the designers knew that this would likely be abused), but clever use of BlueTooth & UUIDs. The only central part was each app downloading a small dataset, daily, from some server, which was needed for the app to know if it had encountered anyone who'd been found to have the plague (tested positive). I forget the details, now (but, explainers abound, if one searches for them), but I think the downloaded dataset contained the UUIDs of those who had subsequently tested positive; so the app cross-checked its own cache of recently-encountered UUIDs broadcast via BlueTooth, and if matches were found then an alert was displayed to the user (otherwise it remained silent) urging the user to go be tested themselves. Thus was avoided the high-risk approach of reporting GNSS traces of everyone who installed such an app, to some central server (likely government controlled). Because doing the latter is simply asking for trouble. Privacy is best preserved by not collecting/divulging sensitive data in the first place; can't abuse what isn't made available.
given that this is plenty off-topic from the OP, but barely scratches the surface of security/privacy, I'd be willing to discuss further in a separate ticket/discussion, if anyone wants to start it and point me there 🙂

the project structure even tries to mirror the decentralized server approach of email, so anybody can run the software and connect to the network

Interesting. Good. I'd be curious about hosting a mirror/node, in future, in that case.

Thank you, that was the intention of programming it this way, so that anybody can add to the network, maybe also only for themselves and their own computer network mirroring. but of course also for other people if you want to.

Indeed. I entirely relate. Reminds me of Searx (the hackable metasearch engine).

I'm inclined to host a public instance. When circumstances permit (life's complicated), I intend to host a whole variety of libre public-interest services.

In my locality (a small island), there's no meaningful tech-culture (I don't count techno-peasant ‘consumer’ culture of tablets, surveillance-boxes, and other Big Tech trash as tech-culture; I mean more like hacker-spaces, projects like this, and so on). For example, there are only a handful of OSM mappers here (and I'm the only (recurring) surveyor; the other contributors are armchair-mappers); we're even severely lacking GNSS (GPS, etc.) traces, and Mapillary imagery (which I'll begin addressing, in the near future). It wasn't that long ago, that even basic network services, like (local) public NTP servers, weren't available (and now only one ISP hosts some, but doesn't advertise it). The closest thing to a maker-space (other than private workshops) is one room in the back of the library, with a few machines; all with lots of advertising by the corporate sponsor, and government-run (which I feel misses the point, somewhat). I could go on; you get the idea.

Ironically, though, we have some of the world's best Internet connectivity (if you're willing to pay for the higher tariffs). Fibre-to-the-(home|building|premises|office) (or your part of it, if shared/multi-occupant), with the actual ONT in your site/building, with tariffs up to 1Gbps ingress (100Mbps egress). Even on the minimum domestic/residential (non-business) tariff, although daytime/on-peak (08:00–24:00) usage has a quota of (last I checked) 10GiB (but this can be increased, for a price), overnight/off-peak (00:00–08:00) is truly unlimited; I gather to encourage folks like me to do their bulk-transfers at times when most people aren't needing low-latency for Web-browsing and other interactive tasks. I made the most of it; in the past, in one month, I transferred nearly a whole TiB of data (I was hosting multiple (sometimes popular) services on a 50+10 Mbps WAN link). I think the default/minimum tariff has increased, again, since then. If you wanna pay about double the price of residential, then you can have a business tariff which is always unlimited (with a lower contention-ratio, too). Sadly, most of these sit idle most of the time; not mine, which was kept busy. For those who're curious, I can give pointers to the ISP's AS-number and it's commercial/retail Webshite [sic].

In the past, I was the only instance of a (local) node for an important service/network (which is anonymity-related, so I won't give specifics; but you can probably guess).

So, yes, I have a whole bunch of things on my hosting wish-list. RadioBrowser is now an entry on that list. Thankyou for actually addressing my concerns, and changing my mind about RadioBrowser 👍😀.

I discovered that, oddly, there's a Google cluster, and an Akamai cluster, here. Yet, no hosting of anything important. Besides having networking as one of my specialties anyway, I feel obliged to give libre projects a chance against the proprietary players of Silicon Valley.

What does this novel have to do with "Allow station edits somehow again."?

Why all the fuss about the optional listening reporting when to listen to a radio you need to ... make a network connection anyway. A massive amount of stream urls are not even https.

What does this novel have to do with "Allow station edits somehow again."?

Why all the fuss about the optional listening reporting when to listen to a radio you need to ... make a network connection anyway. A massive amount of stream urls are not even https.

Probably nothing, but maybe a lot. Honestly, at times the big rants about privacy gets on nerves and make no sense, while being sensible. In logic here this rant is sensible, but not exactly to the point in hour glass itself.

So yes, better to open a separate discussion board under https://github.com/segler-alex/RadioDroid/issues/1176.

segler-alex / radiobrowser-api-rust

Allow station edits somehow again. #49

given that this is plenty off-topic from the OP, but barely scratches the surface of security/privacy, I'd be willing to discuss further in a separate ticket/discussion, if anyone wants to start it and point me there 🙂