segler-alex / radiobrowser-api-rust

radio-browser API implementation in rust
GNU Affero General Public License v3.0
230 stars 96 forks source link

Allow station edits somehow again. #49

Closed Wikinaut closed 11 months ago

Wikinaut commented 4 years ago

Some stations need edits, for example, the icons and title of "Kulturradio". How can I update that? http://www.radio-browser.info/gui/#!/history/960c1c0e-0601-11e8-ae97-52543be04c81

rgctoronto commented 1 year ago

@segler-alex

Using AI to determine the format/tags of a particular radio station has its problems. For instance in the US/Canada a particular radio station might normally play a particular genre of music most of the time. But, it might also carry live play by play sports broadcasts of a local sports team. So, if the AI bot is listening when the sports broadcast is on, it might determine that it is a "sports" radio station when in fact it's a mostly a station dedicated to a particular music genre.

So that kind of thing will require human intervention for accuracy.

But @segler-alex you are free to work on whatever it is that you want to work on.

However, given that there is a great deal of interest in this particular project from many people in many parts of the world, what is problematic are very long periods of time when those of us who have been following this project don't know what you are up to or what your are thinking/working on.

So it would be extremely helpful if you could at least pop in maybe once every month or two to let us all know what's up with things. That way the rest of us can move on as well.

terr72 commented 1 year ago

Using AI to determine the format/tags of a particular radio station has its problems. For instance in the US/Canada a particular radio station might normally play a particular genre of music most of the time. But, it might also carry live play by play sports broadcasts of a local sports team.

That's definitely a problem. It's similar in Austria and Germany, where many radio stations have country or rock evenings during the workdays and long soccer broadcasts on the weekend. That can't be nailed down by letting an AI listen to 30 seconds of the stream. That's why I suggested to take only metadata samples (specifically the StreamTitle tag) in like every 5 minutes for 24h. Then you have a cross section of what's played. This could be looked up by formulating a question to ChatGPT or a song/artist database, in case there is one that tags artists or songs by genre and is free to use.

Aside from that: I don't think that the radio browser database currently leaves much left to desire in terms of missing stations. I'd say it features almost every web radio station there is. The problem is the quality of the existing database, before thinking about "how could I crawl & AI new pages by only letting the user submit the URL of the website"

rgctoronto commented 1 year ago

Aside from that: I don't think that the radio browser database currently leaves much left to desire in terms of missing stations. I'd say it features almost every web radio station there is. The problem is the quality of the existing database, before thinking about "how could I crawl & AI new pages by only letting the user submit the URL of the website"

There is stuff that is indeed missing, but mostly it's links that were there at one time and broke some years ago when a station changed their streaming URL mostly because they changed streaming media providers and or streaming media formats.

conradfr commented 1 year ago

I get that we developers don't like the human aspect ;) and think we can automate everything and that IA is very exciting but that won't solve the database gradual degradation IMHO.

But if you can do it ... great :)

Currently I get a lot of request for edits (mostly logos and stream urls, also a lot of countries are wrong) and have to maintain my own data that I apply over radio-browser data, it seems a bit of wasted efforts to be honest as it does not benefit the whole community.

rgctoronto commented 1 year ago

I get that we developers don't like the human aspect ;) and think we can automate everything and that IA is very exciting but that won't solve the database gradual degradation IMHO.

But if you can do it ... great :)

Currently I get a lot of request for edits (mostly logos and stream urls, also a lot of countries are wrong) and have to maintain my own data that I apply over radio-browser data, it seems a bit of wasted efforts to be honest as it does not benefit the whole community.

If automation can work, that is indeed great. But, in the meantime we are collectively "stuck" with a human created database that has problems, that could be fixed using many of the ideas that have been presented by people here over the last few years.

I am thinking, that perhaps there is at least one person on Planet Earth that @segler-alex trusts who could take over the "human editing" side of things on an interim basis until such time as either a)the new AI system "works" or b) a new "human system" works.

RunningDroid commented 1 year ago

or a song/artist database, in case there is one that tags artists or songs by genre and is free to use.

@terr72 MusicBrainz provides an API, but at first glance it looks like it might require two calls to get the genre (one to lookup the MBID, then another to get the genre for that MBID.)

Moilleadoir commented 1 year ago

Very few of the stations I listen to involve music and several involve minority languages. I have little faith in AI being useful in this area.

I’m certainly not against automation (it makes everyone’s life easier), but you don’t have to completely disallow human imput to achieve automation.

Dan1234t commented 1 year ago

Good 👍

magic-ian commented 1 year ago

@segler-alex is there some short term station edit solution you can help support to improve the database, while we wait for this longer term vision?

Lee-Carre commented 11 months ago

Recently discovered Radio Browser, since Radio? Sure! seems to have died (community editable, daily-published dataset for download, many thousands of stations of various languages/genres around the world).


Seems something of a dilemma, if editing is desired but accountability (eg authentication) isn't.

Might I suggest a voting system. Compare how MusicBrainz does things.

It could be possible to have voting without username+password credentials; have the backend try to figure out if votes come from distinct sources (rather than the same person (or bot) multi-voting).


In the end, I think a choice must be made over the inherent dilemma/conundrum;

There is likely no magic solution; only differing trade-offs.


Some of the benefits to allowing editing (as opposed to new entries for each revision):


Another way, inspired by a few other repos (of public instances of particular software, eg Searx), would be to host the actual database as a git repo, which people can submit pull-requests to, for changes. Thus, you're abstracting/outsourcing the authentication/verification problem to a 3rd-party (whoever hosts the repo; GH, GL, CB, or whoever).

P.S.; seems that others already proposed this: see #95.


Yet another idea; offer multiple datasets.

This way, users can choose which they prefer, rather than the host trying to offer a one-size-fits-all dataset.

Lee-Carre commented 11 months ago

@kekukui

2. Registration will require a CAPTCHA to prevent a malicious user from creating many accounts.

This is a horribly flawed approach, for many reasons. I'm very much reminded of an articulate dissection I read (entitled: Fuck CAPTCHA), but can't seem to find it again 😒. Another rebuttal, instead: My definitive guide to why CAPTCHA sucks.

There's a whole slew of usability and accessibility problems with CAPTCHAs. If this project is user-focused, and intends to be inclusive, then that's completely at odds with using the horror that is CAPTCHA. Don't assume that everyone uses a graphical (visual-rendering) browser; they don't. Learn about screen-readers. For reading material, a quick Web-index query (for “captcha accessibility”) yielded:

There are more fundamental problems;

In the case of the threat-model described (a lone vandal attempting to make many accounts), a much better approach would be to throttle (only him) at the network level (while leaving all other non-vandals unaffected). If he persists, then report him to his ISP (a good tool for determining who that is, from the likes of IP address, is HE's BGP info portal). If the ISP turns out to be uncooperative (they've given him a pink-contract, or simply don't care), then throttle the entire ISP (again, at te network level), or forbid edits from its network (but permit read-only access).

The benefits of focusing on the nature of the submitted changes, rather than where they come from, include:

Complaints about invalid stations will employ a CAPTCHA for rate limiting, to prevent abuse of the complaint mechanism.

Why? If you want to rate-limit, then you certainly don't need CAPTCHA. That makes no sense, and is using the wrong tool to solve the wrong problem (assuming that CAPTCHA solves any problem).

Rate-limiting is better addressed at the network, or hosting level. Start by merging #182 !

Interesting, this suggestion of CAPTCHA for rate-limiting; this strongly implies that it's not about distinguishing bots from humans at all, but about deliberately causing user-inconvenience (to put it nicely). That's pretty terrible design. User-hostile, even.

It's difficult to maintain a dataset when there isn't much of a community willing to jump through the artificially-imposed hurdles. Especially if it's Google's reCAPTCHA; you're then requiring anyone who wants to contribute, to agree to contractual terms with Google, which permit it to do whatever it pleases (even if they don't have (or want) a Google account, or otherwise to be subject to Google's privacy-invasion & data-hoarding).

Savvy users (rightly) interpret this as developers telling them to ‘fuck off’. OK, well, if you don't want users, why bother developing/hosting the project in the first place?

I find it revealing that little thought has gone into considering the implications, with CAPTCHA invoked as some kind of anti-spam incantation, hand-wavingly dismissing the problem (deemed solved, somehow). I'm reminded of a satellite radio company which failed; it documents it cited lack of customers as one of the primary problems. Yet, based on the article I was reading about this, the author explained the extreme difficulty he had in becoming a customer (read: giving them money), in significant part because of terrible design of their Webs[h]ite. Subjugating users doesn't end well. Even Microsoft is learning this inevitable lesson.

Be sure that the cure you propose to an ailment isn't worse than the disease itself. Don't become the monster you fight. We have enough of that (and other heavy-handedness) from government, already. Please don't add to it. It's only too easy to do, especially incrementally.

Porkepix commented 11 months ago

You can add https://nearcyan.com/you-probably-dont-need-recaptcha/ to your resources, focused on the most used one, and also consider the fact that one and every similar ones are illegal in Europe thanks to GDPR: they collect data besides just the security feature for complete other reasons and that's problematic.

Lee-Carre commented 11 months ago

For comic relief (because sanity-preservation demands it); if the goal is to vex users plenty, then one could always:

Lee-Carre commented 11 months ago

@Vrihub

we should just set up a new community-managed station database

I've been thinking very similarly, especially (as I mentioned) since Radio? Sure! has died.

I have a bunch of notes on the matter. I was thinking along lines that it should be equivalent to CDDB (which I'm using as a general catch-all term to also include FreeDB, and now GnuDB).

I tire of all the hurdles (to find/acquire stream metadata), and wheel-reinventing (project/DB proliferation), as each player tries to become the central monopoly. Even Radio Browser seems to have removed the possibility of downloading the dataset/database for offline use (based on (a now-defunct link) the post in which I learned of Radio Browser), and prefers to focus on collecting usage metadata instead.

There are disparate lists, all over, but no unified DB which encapsulates them all.

Unfortunately, I'm not in a position to start such a project, for the foreseeable future (life's complicated). However, do you know of anywhere else that such an idea is being discussed? I care more about the principles behind it (community, libre), rather than who runs it, so would be willing to contribute ideas. I also have a somewhat-outdated (again, life's complicated) but non-ancient dataset from Radio? Sure! on storage which I possess, if that'd help to seed a new project.

Honestly/bluntly, this should be a solved problem, by now. So, it's long overdue.

Lee-Carre commented 11 months ago

@wolterhv

Firstly, I agree with the sentiment of your overall post

OSM is a far larger project and has a collaborative model. I use it quite often and I have yet to see vandalous edits.

However, as a fellow OSM editor (on-foot surveyor; the only one in my area; when circumstances permit (life's complicated)), I have a few remarks.

I've seen occasional (minor) vandalism. Or, at least, what seemed to be vandalism. I suppose it could've been a gross error by a newbie, who didn't grok the implications of what he was doing.

That leads into my more general point; while malicious changesets are indeed a problem to be concerned about, the far more common case is dubious changes made non-maliciously, especially by newbies who are unfamiliar (with either OSM, or the concepts underpinning it, such as why they shouldn't tag for the renderer and why the data and renderings are separate (akin to HTML versus CSS)).

So, I'd argue that the whole ‘identify the bad editors’ is folly, and futile. Everyone makes mistakes (typos), so a more generalised system, by consensus, which focuses on accepting quality changes (and rejecting dubious ones), covers a multitude of cases (regardless of how they happen).

Instead of a DB host having to re-invent the wheel (ie how to deal with each of these problems), offloading that to a system which has already had to tackle them, and has mechanisms for reviewing changes, and so on, seems wise. Compare using a library of someone else's code; that's then abstracted away for you, and in a sense not your problem (fixes should be applied upstream).

The current hold-up is that the dataset is in a binary database, instead of primarily textual (from which a DB could be generated/updated). Text enables all sorts of benefits. See my comment in #95.


Much more generally, now;

In this model, radio-browser.info would be one of the databases a user can pick, and radio-browser.info clients can let the user select which database they want their radio stations from.

Either that, or (based on comments here) radio-browser being a front-end to a DB hosted elsewhere (because conflict of interest, otherwise).

I'd argue that the design/architecture of the dataset (what metadata is held, in what format, etc.) should be done openly, from scratch. Compare MusicBrainz to CDDB. The former is a whole lot more capable than the latter, likely due to lessons learned from the limitations of CDDB. Plus, the general principle of future extensibility (since future needs are as yet unknown).

Moreover, my ultimate point, here, is that I'm the type who doesn't want to use an ‘app’ at all. I want the dataset/database itself. This is why I was delighted to discover Radio? Sure!, and sad when it shutdown. RadioSure was a basic version (CDDB level, rather than full-blown MusicBrainz; but it worked) of what we're discussing here;

So, I had scripts to fetch & query the dataset, locally, offline. I then used a local media player to actually play the streams of my choosing. Nothing more to it.

So, while I prefer a non-app approach, myself, if others do then that's up to them. The dataset should be libre for anyone to use. My point is simply that those come as a result of it being readily available (eg for my use-case). Folks are then free to use it as they wish; in an app, or however else.

Compare OSM; you can download world.osm and host it yourself, if you so choose. You don't have to interact with OSM at all (other than to fetch the dataset). You can even host your own instance of overpass by fetching the minutely diffs. Querying OSM APIs is a possibility in addition to this, not instead of it.

One of the problems with API-only access is that as demand increases, so must the capacity of the hosting. Hence why, for growing datasets, offering downloads is often more efficient for both parties.

So, for me, unless Radio Browser is gonna become libre in any meaningful sense (publishing the DB/dataset in a machine-readable format, without proprietary encumberments, instead of conducting pervasive survelliance on users), then the DB/dataset must be hosted elsewhere (and regenerated, if really needs be) leaving Radio Browser to be a front-end (Website, app).

Ultimately, the dataset is the crucial part (hence why that's the part which hosts are most hoarding/miserly over).

The problem, if Radio Browser is centralised upon, in its current state, is the same as for RadioSure; what happens if it gets into difficulty later, or its lone developer doesn't have the time (like now, when he's off developing his app, while neglecting the dataset which the app relies on). If the dataset is locked away behind some API, then it could easily be lost, should the site ever go offline (not only am I thinking of RadioSure, here, but also FreeDB more recently). The model which RadioSure seemed to use was that their app was the commercial focus. It required a dataset of stations to drive it, of course, and I suspect that the thinking was to exploit the community to do the maintenance work for them (hence having that part be open). This didn't pan out for them. As soon as the money stopped rolling in, it all disappeared; dataset and all. At least FreeDB made the effort to ensure that archive copies of the last dataset revision were published for others. Sadly, GnuDB seems to not allow downloading of its dataset(!)

My concern, generally, is that unless such fundamentals are fixed, then we'll keep going in circles, re-inventing the wheel.

segler-alex commented 11 months ago

@Vrihub

we should just set up a new community-managed station database

I've been thinking very similarly, especially (as I mentioned) since Radio? Sure! has died.

I have a bunch of notes on the matter. I was thinking along lines that it should be equivalent to CDDB (which I'm using as a general catch-all term to also include FreeDB, and now GnuDB).

I tire of all the hurdles (to find/acquire stream metadata), and wheel-reinventing (project/DB proliferation), as each player tries to become the central monopoly. Even Radio Browser seems to have removed the possibility of downloading the dataset/database for offline use (based on (a now-defunct link) the post in which I learned of Radio Browser), and prefers to focus on collecting usage metadata instead.

There are disparate lists, all over, but no unified DB which encapsulates them all.

Unfortunately, I'm not in a position to start such a project, for the foreseeable future (life's complicated). However, do you know of anywhere else that such an idea is being discussed? I care more about the principles behind it (community, libre), rather than who runs it, so would be willing to contribute ideas. I also have a somewhat-outdated (again, life's complicated) but non-ancient dataset from Radio? Sure! on storage which I possess, if that'd help to seed a new project.

Honestly/bluntly, this should be a solved problem, by now. So, it's long overdue.

I want to clarify that I did NOT remove the posibility to download the entire database.

There is NO hidden data. NO data collection. This a hobby project for me. It does not generate money in anyway. I loose money with it. I do it because i like programming. and every hobby costs money. If you want to download the current database in any format, feel free to do it, and use it to kickstart your own database. That was my initial idea, why i started this project to have something completely open and opensource. the software with all libraries i created for it is. https://www.radio-browser.info/faq -> project components the project structure even tries to mirror the decentralized server approach of email, so anybody can run the software and connect to the network. or write something that is compatible to the json format. I hope i could clarify things.

I would say the easiest way to start a new database with all features is to put some xml files on a gitlab project and allow people to do merge requests. then let people download the files from there. Add gitlab pages to the mix to publish the xml files on every merge and you already have a completely open text file way of sharing and managing radio lists.

Lee-Carre commented 11 months ago

I did NOT remove the pos[s]ibility to download the entire database

Ah! Excellent. In that case, I withdraw my assertions to the contrary.

Might I suggest

Radio-Browser is now immediately much more interesting+promising, to me (in place of the former Radio? Sure!), so long as this trend continues. 👍


There is […] NO data collection.

From API documentation, under HowTo § use it directly § [№ 3] Remember the following things § [2nd bullet point]:

Send /json/url requests for every click the user makes, this helps to mark stations as popular and makes the database more usefull to other people.

Sounds exactly like collection of user (behavioural) data, to me. Covert (non-voluntary) collection, at that.


the project structure even tries to mirror the decentralized server approach of email, so anybody can run the software and connect to the network

Interesting. Good.

I'd be curious about hosting a mirror/node, in future, in that case.

segler-alex commented 11 months ago

I did NOT remove the pos[s]ibility to download the entire database

Ah! Excellent. In that case, I withdraw my assertions to the contrary.

Might I suggest

* an HTTP redirect from the old download URL, to perhaps a landing page giving info about from where (and basics of how) to download data (either partial, or the complete dataset).

Sorry, you have me at a loss, please explain what you mean with "the old download url". I do not remember any other ones as the ones I gave you.

* ~Perhaps also a somewhat more prominent pointer, to such info, from the main `www.radio-browser.info` site.~ Nevermind; should've checked the [FAQ](//www.radio-browser.info/faq) 🤦‍♂️.

Radio-Browser is now immediately much more interesting+promising, to me (in place of the former Radio? Sure!), so long as this trend continues. 👍

There is […] NO data collection.

From API documentation, under HowTo § use it directly § [№ 3] Remember the following things § [2nd bullet point]:

Send /json/url requests for every click the user makes, this helps to mark stations as popular and makes the database more usefull to other people.

Sounds exactly like collection of user (behavioural) data, to me. Covert (non-voluntary) collection, at that.

the project structure even tries to mirror the decentralized server approach of email, so anybody can run the software and connect to the network

Interesting. Good.

I'd be curious about hosting a mirror/node, in future, in that case.

The API endpoint you are refering to is optional, you do not have to use it if you do not want to. It also does not save per user data, it just adds to the clicks of the station so that all people know which stations are well liked. I never thought this could be seen as collecting behavioural data in a negative way. But I will think about adding a feature to the Android app to allow the user to opt-out of sending this. For your own api uses of course you already can decide to not send it. I created a ticket for it here: https://github.com/segler-alex/RadioDroid/issues/1176

Thank you, that was the intention of programming it this way, so that anybody can add to the network, maybe also only for themselves and their own computer network mirroring. but of course also for other people if you want to.

Lee-Carre commented 11 months ago

Might I suggest

  • an HTTP redirect from the old download URL, to perhaps a landing page giving info about from where (and basics of how) to download data (either partial, or the complete dataset).

Sorry, you have me at a loss, please explain what you mean with "the old download url". I do not remember any other ones as the ones I gave you.

At 2023-09-23T19:50Z segler-alex posted (quoting me (LHC)):

[…] I tire of all the hurdles (to find/acquire stream metadata), and wheel-reinventing (project/DB proliferation), as each player tries to become the central monopoly. Even Radio Browser seems to have removed the possibility of downloading the dataset/database for offline use (based on (a now-defunct link) the post in which I learned of Radio Browser), and prefers to focus on collecting usage metadata instead. […]

In which I supplied a link using the now-defunct download URL, and the source from whence I got said URL in the first place (in case that info is erroneous).


There is […] NO data collection.

From API documentation, under HowTo § use it directly § [№ 3] Remember the following things § [2nd bullet point]:

Send /json/url requests for every click the user makes, this helps to mark stations as popular and makes the database more usefull to other people.

Sounds exactly like collection of user (behavioural) data, to me. Covert (non-voluntary) collection, at that.

The API endpoint you are refer[r]ing to is optional, you do not have to use it if you do not want to. It also does not save per user data, it just adds to the clicks of the station so that all people know which stations are well liked. I never thought this could be seen as collecting behavioural data in a negative way. But I will think about adding a feature to the Android app to allow the user to opt-out of sending this. For your own api uses of course you already can decide to not send it. I created a ticket for it here: segler-alex/RadioDroid#1176

Multiple points, here;

the project structure even tries to mirror the decentralized server approach of email, so anybody can run the software and connect to the network

Interesting. Good. I'd be curious about hosting a mirror/node, in future, in that case.

Thank you, that was the intention of programming it this way, so that anybody can add to the network, maybe also only for themselves and their own computer network mirroring. but of course also for other people if you want to.

Indeed. I entirely relate. Reminds me of Searx (the hackable metasearch engine).

I'm inclined to host a public instance. When circumstances permit (life's complicated), I intend to host a whole variety of libre public-interest services.

In my locality (a small island), there's no meaningful tech-culture (I don't count techno-peasant ‘consumer’ culture of tablets, surveillance-boxes, and other Big Tech trash as tech-culture; I mean more like hacker-spaces, projects like this, and so on). For example, there are only a handful of OSM mappers here (and I'm the only (recurring) surveyor; the other contributors are armchair-mappers); we're even severely lacking GNSS (GPS, etc.) traces, and Mapillary imagery (which I'll begin addressing, in the near future). It wasn't that long ago, that even basic network services, like (local) public NTP servers, weren't available (and now only one ISP hosts some, but doesn't advertise it). The closest thing to a maker-space (other than private workshops) is one room in the back of the library, with a few machines; all with lots of advertising by the corporate sponsor, and government-run (which I feel misses the point, somewhat). I could go on; you get the idea.

Ironically, though, we have some of the world's best Internet connectivity (if you're willing to pay for the higher tariffs). Fibre-to-the-(home|building|premises|office) (or your part of it, if shared/multi-occupant), with the actual ONT in your site/building, with tariffs up to 1Gbps ingress (100Mbps egress). Even on the minimum domestic/residential (non-business) tariff, although daytime/on-peak (08:00–24:00) usage has a quota of (last I checked) 10GiB (but this can be increased, for a price), overnight/off-peak (00:00–08:00) is truly unlimited; I gather to encourage folks like me to do their bulk-transfers at times when most people aren't needing low-latency for Web-browsing and other interactive tasks. I made the most of it; in the past, in one month, I transferred nearly a whole TiB of data (I was hosting multiple (sometimes popular) services on a 50+10 Mbps WAN link). I think the default/minimum tariff has increased, again, since then. If you wanna pay about double the price of residential, then you can have a business tariff which is always unlimited (with a lower contention-ratio, too). Sadly, most of these sit idle most of the time; not mine, which was kept busy. For those who're curious, I can give pointers to the ISP's AS-number and it's commercial/retail Webshite [sic].

In the past, I was the only instance of a (local) node for an important service/network (which is anonymity-related, so I won't give specifics; but you can probably guess).

So, yes, I have a whole bunch of things on my hosting wish-list. RadioBrowser is now an entry on that list. Thankyou for actually addressing my concerns, and changing my mind about RadioBrowser 👍😀.

I discovered that, oddly, there's a Google cluster, and an Akamai cluster, here. Yet, no hosting of anything important. Besides having networking as one of my specialties anyway, I feel obliged to give libre projects a chance against the proprietary players of Silicon Valley.

conradfr commented 11 months ago

What does this novel have to do with "Allow station edits somehow again."?

Why all the fuss about the optional listening reporting when to listen to a radio you need to ... make a network connection anyway. A massive amount of stream urls are not even https.

vdbhb59 commented 11 months ago

What does this novel have to do with "Allow station edits somehow again."?

Why all the fuss about the optional listening reporting when to listen to a radio you need to ... make a network connection anyway. A massive amount of stream urls are not even https.

Probably nothing, but maybe a lot. Honestly, at times the big rants about privacy gets on nerves and make no sense, while being sensible. In logic here this rant is sensible, but not exactly to the point in hour glass itself.

So yes, better to open a separate discussion board under https://github.com/segler-alex/RadioDroid/issues/1176.