Open mpgirro opened 3 years ago
Here are some first results.
The feeds are collected from querying the Fyyd/Gpodder/Panoptikum directories (just because I already had some old Java code I could adapt) and a local txt file I used for testing something else a few years ago.
Feeds processed successfully: 1362
Feeds loading failed: 235
Feeds parsing failed: 73
Analysing the successfully loaded and parsed XMLs leads to the following distribution of namespaces and their elements/attributes (every element/attribute is counted once per feed):
* com-wordpress:feed-additions:1 (in 63 feeds)
- post-id (Element, in 62 feeds)
- site (Element, in 62 feeds)
* http://a9.com/-/spec/opensearchrss/1.0/ (in 16 feeds)
- itemsPerPage (Element, in 16 feeds)
- startIndex (Element, in 16 feeds)
- totalResults (Element, in 16 feeds)
* http://backend.userland.com/blogChannelModule (in 1 feeds)
* http://backend.userland.com/creativeCommonsRssModule (in 38 feeds)
- license (Element, in 35 feeds)
* http://bbc.co.uk/2009/01/ppgRss (in 38 feeds)
- canonical (Element, in 37 feeds)
- enclosureLegacy (Element, in 37 feeds)
- enclosureSecure (Element, in 37 feeds)
- network (Element, in 38 feeds)
- seriesDetails (Element, in 38 feeds)
- systemRef (Element, in 38 feeds)
* http://bitlove.org (in 18 feeds)
- guid (Attribute, in 16 feeds)
* http://developer.longtailvideo.com/ (in 7 feeds)
- talkId (Element, in 6 feeds)
* http://fireside.fm/modules/rss/fireside (in 5 feeds)
- genDate (Element, in 5 feeds)
- hostname (Element, in 5 feeds)
- playerEmbedCode (Element, in 5 feeds)
- playerURL (Element, in 5 feeds)
* http://madskills.com/public/xml/rss/module/trackback/ (in 1 feeds)
* http://ogp.me/ns# (in 1 feeds)
* http://pipes.yahoo.com (in 1 feeds)
- meta (Element, in 1 feeds)
* http://podcastaddict.com (in 2 feeds)
* http://podlove.org/simple-chapters (in 250 feeds)
- chapter (Element, in 176 feeds)
- chapters (Element, in 176 feeds)
* http://purl.org/dc/elements/1.1/ (in 401 feeds)
- creator (Element, in 265 feeds)
- date (Element, in 15 feeds)
- identifier (Element, in 2 feeds)
- language (Element, in 13 feeds)
- rights (Element, in 11 feeds)
* http://purl.org/dc/terms/ (in 6 feeds)
- created (Element, in 6 feeds)
- modified (Element, in 6 feeds)
* http://purl.org/rss/1.0/modules/content (in 1 feeds)
* http://purl.org/rss/1.0/modules/content/ (in 1000 feeds)
- encoded (Element, in 781 feeds)
* http://purl.org/rss/1.0/modules/slash/ (in 250 feeds)
- comments (Element, in 212 feeds)
* http://purl.org/rss/1.0/modules/syndication/ (in 274 feeds)
- updateBase (Element, in 1 feeds)
- updateFrequency (Element, in 219 feeds)
- updatePeriod (Element, in 219 feeds)
* http://purl.org/rss/1.0/modules/taxonomy/ (in 10 feeds)
* http://purl.org/syndication/history/1.0 (in 231 feeds)
* http://purl.org/syndication/thread/1.0 (in 12 feeds)
- total (Element, in 12 feeds)
* http://radiofrance.fr/Lancelot/Podcast# (in 2 feeds)
- businessReference (Element, in 1 feeds)
- magnetothequeID (Element, in 1 feeds)
- originStation (Element, in 2 feeds)
* http://rdfs.org/sioc/ns# (in 1 feeds)
* http://rdfs.org/sioc/types# (in 1 feeds)
* http://rssnamespace.org/feedburner/ext/1.0 (in 195 feeds)
- browserFriendly (Element, in 13 feeds)
- emailServiceId (Element, in 13 feeds)
- feedFlare (Element, in 48 feeds)
- feedburnerHostname (Element, in 13 feeds)
- info (Element, in 191 feeds)
- origEnclosureLink (Element, in 102 feeds)
- origLink (Element, in 126 feeds)
* http://schema.org/ (in 1 feeds)
* http://schemas.google.com/blogger/2008 (in 4 feeds)
* http://schemas.google.com/g/2005 (in 4 feeds)
* http://search.yahoo.com/mrss (in 1 feeds)
- restriction (Element, in 1 feeds)
* http://search.yahoo.com/mrss/ (in 597 feeds)
- category (Element, in 200 feeds)
- content (Element, in 266 feeds)
- copyright (Element, in 137 feeds)
- credit (Element, in 199 feeds)
- description (Element, in 194 feeds)
- group (Element, in 1 feeds)
- keywords (Element, in 180 feeds)
- player (Element, in 28 feeds)
- rating (Element, in 226 feeds)
- restriction (Element, in 6 feeds)
- rights (Element, in 12 feeds)
- thumbnail (Element, in 187 feeds)
- title (Element, in 35 feeds)
* http://vemedio.com/dtds/atom/related-1.0.dtd (in 1 feeds)
- apple-itunes-app (Attribute, in 1 feeds)
* http://web.resource.org/cc/ (in 83 feeds)
* http://webns.net/mvcb/ (in 28 feeds)
- errorReportsTo (Element, in 1 feeds)
- generatorAgent (Element, in 1 feeds)
* http://wellformedweb.org/CommentAPI/ (in 255 feeds)
- comment (Element, in 2 feeds)
- commentRss (Element, in 170 feeds)
* http://www.adobe.com/amp/1.0 (in 2 feeds)
- background (Element, in 1 feeds)
- banner (Element, in 2 feeds)
- logo (Element, in 2 feeds)
- networkBackground (Element, in 1 feeds)
- networkHalfBanner (Element, in 2 feeds)
- networkLogo (Element, in 2 feeds)
- networkSmallLogo (Element, in 2 feeds)
- networkWebsite (Element, in 2 feeds)
* http://www.apple.com/iweb (in 2 feeds)
* http://www.ard.de/ardNamespace (in 12 feeds)
- sendereihe (Element, in 12 feeds)
- visibility (Element, in 12 feeds)
- visibleFrom (Element, in 12 feeds)
- visibleUntil (Element, in 12 feeds)
* http://www.freie-radios.net/namespaces/frn (in 2 feeds)
- art (Element, in 2 feeds)
- id (Element, in 2 feeds)
- laenge (Element, in 2 feeds)
- language (Element, in 2 feeds)
- last_update (Element, in 2 feeds)
- licence (Element, in 2 feeds)
- radio (Element, in 2 feeds)
- serie (Element, in 2 feeds)
- title (Element, in 2 feeds)
* http://www.georss.org/georss (in 98 feeds)
- box (Element, in 3 feeds)
- featurename (Element, in 3 feeds)
- point (Element, in 7 feeds)
* http://www.google.com/schemas/play-podcasts/1.0 (in 427 feeds)
- author (Element, in 41 feeds)
- block (Element, in 14 feeds)
- category (Element, in 104 feeds)
- description (Element, in 53 feeds)
- email (Element, in 41 feeds)
- explicit (Element, in 52 feeds)
- image (Element, in 26 feeds)
- summary (Element, in 1 feeds)
* http://www.google.com/schemas/play-podcasts/1.0/ (in 15 feeds)
* http://www.google.com/schemas/play-podcasts/1.0/play-podcasts.xsd (in 1 feeds)
* http://www.itunes.com/DTDs/Podcast-1.0.dtd (in 12 feeds)
- author (Element, in 11 feeds)
- category (Element, in 11 feeds)
- duration (Element, in 11 feeds)
- email (Element, in 11 feeds)
- explicit (Element, in 9 feeds)
- image (Element, in 12 feeds)
- keywords (Element, in 11 feeds)
- link (Element, in 10 feeds)
- name (Element, in 10 feeds)
- new-feed-url (Element, in 4 feeds)
- owner (Element, in 9 feeds)
- subtitle (Element, in 11 feeds)
- summary (Element, in 11 feeds)
* http://www.itunes.com/dtds/podcast-1.0.dtd (in 1322 feeds)
- author (Element, in 1317 feeds)
- block (Element, in 419 feeds)
- category (Element, in 1287 feeds)
- complete (Element, in 10 feeds)
- copyright (Element, in 2 feeds)
- duration (Element, in 1224 feeds)
- email (Element, in 1296 feeds)
- episode (Element, in 496 feeds)
- episodeType (Element, in 690 feeds)
- explicit (Element, in 1270 feeds)
- image (Element, in 1294 feeds)
- isClosedCaptioned (Element, in 2 feeds)
- keywords (Element, in 665 feeds)
- link (Element, in 6 feeds)
- name (Element, in 1269 feeds)
- new-feed-url (Element, in 245 feeds)
- order (Element, in 6 feeds)
- owner (Element, in 1302 feeds)
- season (Element, in 129 feeds)
- subitle (Element, in 1 feeds)
- subtitle (Element, in 1218 feeds)
- summary (Element, in 1264 feeds)
- title (Element, in 536 feeds)
- type (Element, in 702 feeds)
* http://www.itunesu.com/feed (in 1 feeds)
- category (Element, in 1 feeds)
* http://www.rawvoice.com/rawvoiceRssModule/ (in 127 feeds)
- donate (Element, in 17 feeds)
- embed (Element, in 2 feeds)
- frequency (Element, in 54 feeds)
- isHD (Element, in 1 feeds)
- isHd (Element, in 1 feeds)
- location (Element, in 51 feeds)
- poster (Element, in 3 feeds)
- rating (Element, in 38 feeds)
- subscribe (Element, in 85 feeds)
* http://www.rssboard.org/media-rss (in 15 feeds)
- category (Element, in 1 feeds)
- content (Element, in 12 feeds)
- copyright (Element, in 1 feeds)
- credit (Element, in 2 feeds)
- description (Element, in 2 feeds)
- keywords (Element, in 1 feeds)
- rating (Element, in 2 feeds)
- thumbnail (Element, in 1 feeds)
- title (Element, in 6 feeds)
* http://www.spotify.com/ns/rss (in 17 feeds)
- countryOfOrigin (Element, in 3 feeds)
* http://www.w3.org/1999/02/22-rdf-syntax-ns# (in 121 feeds)
- resource (Attribute, in 1 feeds)
* http://www.w3.org/1999/xhtml (in 4 feeds)
- body (Element, in 1 feeds)
- meta (Element, in 3 feeds)
* http://www.w3.org/2000/01/rdf-schema# (in 1 feeds)
* http://www.w3.org/2000/xmlns/ (in 1358 feeds)
- Atom (Attribute, in 1 feeds)
- acast (Attribute, in 58 feeds)
- admin (Attribute, in 28 feeds)
- amp (Attribute, in 2 feeds)
- anchor (Attribute, in 6 feeds)
- ard (Attribute, in 12 feeds)
- art19 (Attribute, in 15 feeds)
- atom (Attribute, in 1203 feeds)
- atom10 (Attribute, in 193 feeds)
- audioboom (Attribute, in 12 feeds)
- bitlove (Attribute, in 18 feeds)
- blogChannel (Attribute, in 1 feeds)
- blogger (Attribute, in 4 feeds)
- cba (Attribute, in 2 feeds)
- cc (Attribute, in 83 feeds)
- content (Attribute, in 1006 feeds)
- creativeCommons (Attribute, in 38 feeds)
- dc (Attribute, in 401 feeds)
- dcterms (Attribute, in 6 feeds)
- feedburner (Attribute, in 195 feeds)
- feedpress (Attribute, in 49 feeds)
- fh (Attribute, in 231 feeds)
- fireside (Attribute, in 5 feeds)
- foaf (Attribute, in 1 feeds)
- frn (Attribute, in 2 feeds)
- fyyd (Attribute, in 64 feeds)
- gd (Attribute, in 4 feeds)
- geo (Attribute, in 95 feeds)
- georss (Attribute, in 98 feeds)
- googleplay (Attribute, in 444 feeds)
- itunes (Attribute, in 1337 feeds)
- itunesu (Attribute, in 1 feeds)
- iweb (Attribute, in 2 feeds)
- jwplayer (Attribute, in 7 feeds)
- media (Attribute, in 610 feeds)
- npr (Attribute, in 6 feeds)
- nprml (Attribute, in 6 feeds)
- og (Attribute, in 1 feeds)
- omny (Attribute, in 26 feeds)
- openSearch (Attribute, in 16 feeds)
- pa (Attribute, in 2 feeds)
- pingback (Attribute, in 20 feeds)
- podaccess (Attribute, in 24 feeds)
- podcast (Attribute, in 301 feeds)
- podcastRF (Attribute, in 2 feeds)
- ppg (Attribute, in 38 feeds)
- psc (Attribute, in 276 feeds)
- rawvoice (Attribute, in 127 feeds)
- rdf (Attribute, in 121 feeds)
- rdfs (Attribute, in 1 feeds)
- related (Attribute, in 1 feeds)
- sc (Attribute, in 4 feeds)
- schema (Attribute, in 1 feeds)
- sioc (Attribute, in 1 feeds)
- sioct (Attribute, in 1 feeds)
- skos (Attribute, in 1 feeds)
- slash (Attribute, in 250 feeds)
- spotify (Attribute, in 18 feeds)
- sy (Attribute, in 274 feeds)
- taxo (Attribute, in 10 feeds)
- thr (Attribute, in 12 feeds)
- trackback (Attribute, in 1 feeds)
- wfw (Attribute, in 255 feeds)
- xhtml (Attribute, in 4 feeds)
- xmlns (Attribute, in 87 feeds)
- xsd (Attribute, in 1 feeds)
- xsi (Attribute, in 4 feeds)
* http://www.w3.org/2001/XMLSchema# (in 1 feeds)
* http://www.w3.org/2001/XMLSchema-instance (in 4 feeds)
* http://www.w3.org/2003/01/geo/wgs84_pos# (in 95 feeds)
- lat (Element, in 13 feeds)
- long (Element, in 13 feeds)
* http://www.w3.org/2004/02/skos/core# (in 1 feeds)
* http://www.w3.org/2005/Atom (in 1205 feeds)
- contributor (Element, in 145 feeds)
- email (Element, in 2 feeds)
- facebook (Element, in 1 feeds)
- id (Element, in 4 feeds)
- link (Element, in 1166 feeds)
- name (Element, in 145 feeds)
- updated (Element, in 4 feeds)
- uri (Element, in 70 feeds)
* http://www.w3.org/2005/Atom/ (in 25 feeds)
- link (Element, in 8 feeds)
* http://www.w3.org/XML/1998/namespace (in 37 feeds)
- base (Attribute, in 31 feeds)
- lang (Attribute, in 6 feeds)
* http://xmlns.com/foaf/0.1/ (in 1 feeds)
* https://access.acast.com/schema/1.0/ (in 4 feeds)
* https://anchor.fm/xmlns (in 6 feeds)
- station (Element, in 1 feeds)
- support (Element, in 1 feeds)
* https://api.npr.org/nprml (in 6 feeds)
* https://art19.com/xmlns/rss-extensions/1.0 (in 15 feeds)
* https://audioboom.com/rss/1.0 (in 12 feeds)
- banner-image (Element, in 7 feeds)
* https://cba.fro.at/help#feeds (in 2 feeds)
- attachmentID (Element, in 2 feeds)
- broadcastDate (Element, in 2 feeds)
- containsCopyright (Element, in 2 feeds)
- duration (Element, in 2 feeds)
- productionDate (Element, in 2 feeds)
- teaser (Element, in 2 feeds)
* https://feed.press/xmlns (in 49 feeds)
- locale (Element, in 49 feeds)
- newsletterId (Element, in 3 feeds)
- podcastId (Element, in 18 feeds)
* https://fyyd.de/fyyd-ns/ (in 64 feeds)
- verify (Element, in 64 feeds)
* https://github.com/Podcastindex-org/podcast-namespace/blob/main/docs/1.0.md (in 72 feeds)
- chapters (Element, in 4 feeds)
- funding (Element, in 14 feeds)
- location (Element, in 19 feeds)
- locked (Element, in 7 feeds)
- person (Element, in 5 feeds)
- transcript (Element, in 1 feeds)
* https://omny.fm/rss-extensions (in 26 feeds)
- clipId (Element, in 26 feeds)
- stitcherId (Element, in 12 feeds)
* https://podcastindex.org/namespace/1.0 (in 229 feeds)
- funding (Element, in 13 feeds)
- license (Element, in 3 feeds)
- location (Element, in 3 feeds)
- locked (Element, in 6 feeds)
- person (Element, in 95 feeds)
- transcript (Element, in 9 feeds)
- value (Element, in 3 feeds)
- valueRecipient (Element, in 3 feeds)
* https://podlove.de/simple-chapters (in 2 feeds)
- chapter (Element, in 2 feeds)
- chapters (Element, in 2 feeds)
* https://podlove.org/simple-chapters (in 4 feeds)
- chapter (Element, in 4 feeds)
- chapters (Element, in 4 feeds)
* https://podlove.org/simple-chapters/ (in 26 feeds)
- chapter (Element, in 5 feeds)
- chapters (Element, in 5 feeds)
* https://podping.info/specification/1 (in 20 feeds)
- receiver (Element, in 20 feeds)
* https://purl.org/rss/1.0/modules/content/ (in 3 feeds)
- encoded (Element, in 3 feeds)
* https://schema-access.acast.com/1.0/ (in 20 feeds)
* https://schema.acast.com/1.0/ (in 58 feeds)
- episodeId (Element, in 24 feeds)
- episodeUrl (Element, in 24 feeds)
- importedFeed (Element, in 4 feeds)
- network (Element, in 20 feeds)
- settings (Element, in 32 feeds)
- showId (Element, in 24 feeds)
- showUrl (Element, in 22 feeds)
- signature (Element, in 24 feeds)
* https://www.google.com/schemas/play-podcasts/1.0 (in 1 feeds)
* https://www.itunes.com/dtds/podcast-1.0.dtd (in 5 feeds)
- author (Element, in 5 feeds)
- category (Element, in 3 feeds)
- duration (Element, in 2 feeds)
- email (Element, in 3 feeds)
- explicit (Element, in 3 feeds)
- image (Element, in 5 feeds)
- keywords (Element, in 2 feeds)
- name (Element, in 3 feeds)
- owner (Element, in 3 feeds)
- subtitle (Element, in 5 feeds)
- summary (Element, in 5 feeds)
* https://www.npr.org/rss/ (in 6 feeds)
* https://www.rssboard.org/rss-specification (in 1362 feeds)
- a (Element, in 5 feeds)
- active (Attribute, in 1 feeds)
- address (Attribute, in 3 feeds)
- algorithm (Attribute, in 24 feeds)
- amazon (Attribute, in 2 feeds)
- android (Attribute, in 2 feeds)
- audioId (Element, in 5 feeds)
- author (Element, in 186 feeds)
- bitrate (Attribute, in 1 feeds)
- blockquote (Element, in 1 feeds)
- blubrry (Attribute, in 10 feeds)
- body (Element, in 2 feeds)
- br (Element, in 3 feeds)
- broadcastlimit (Element, in 6 feeds)
- category (Element, in 460 feeds)
- cbcListenUrl (Element, in 1 feeds)
- channel (Element, in 1360 feeds)
- channelExportDir (Element, in 5 feeds)
- cloud (Element, in 7 feeds)
- code (Attribute, in 1 feeds)
- comments (Element, in 221 feeds)
- content (Attribute, in 5 feeds)
- contentLink (Element, in 1 feeds)
- copyright (Element, in 921 feeds)
- day (Element, in 1 feeds)
- daysLive (Attribute, in 38 feeds)
- deezer (Attribute, in 4 feeds)
- description (Element, in 1350 feeds)
- docs (Element, in 188 feeds)
- domain (Attribute, in 22 feeds)
- domain (Element, in 9 feeds)
- duration (Attribute, in 51 feeds)
- em (Element, in 2 feeds)
- email (Attribute, in 7 feeds)
- enclosure (Element, in 1340 feeds)
- encoding (Attribute, in 5 feeds)
- episode_mp3 (Element, in 1 feeds)
- expression (Attribute, in 39 feeds)
- fee (Attribute, in 3 feeds)
- feed (Attribute, in 85 feeds)
- ffmpeg (Element, in 6 feeds)
- fileSize (Attribute, in 198 feeds)
- frequency (Attribute, in 38 feeds)
- generator (Element, in 898 feeds)
- geo (Attribute, in 3 feeds)
- googleplay (Attribute, in 1 feeds)
- guid (Element, in 1349 feeds)
- guid (Attribute, in 1 feeds)
- head (Element, in 2 feeds)
- height (Element, in 173 feeds)
- height (Attribute, in 48 feeds)
- hour (Element, in 2 feeds)
- href (Attribute, in 1336 feeds)
- html (Attribute, in 25 feeds)
- html (Element, in 2 feeds)
- http-equiv (Attribute, in 2 feeds)
- id (Attribute, in 58 feeds)
- iheart (Attribute, in 1 feeds)
- ilink (Element, in 5 feeds)
- image (Element, in 1119 feeds)
- image (Attribute, in 2 feeds)
- img (Attribute, in 95 feeds)
- isDefault (Attribute, in 12 feeds)
- isPermaLink (Attribute, in 1193 feeds)
- isPermalink (Attribute, in 17 feeds)
- item (Element, in 1351 feeds)
- itunes (Attribute, in 80 feeds)
- ituneslink (Element, in 1 feeds)
- itunesowner (Element, in 2 feeds)
- key (Attribute, in 62 feeds)
- label (Attribute, in 21 feeds)
- lame (Element, in 6 feeds)
- lang (Attribute, in 12 feeds)
- language (Element, in 1353 feeds)
- lastBuildDate (Element, in 989 feeds)
- latitude (Element, in 1 feeds)
- launchDate (Attribute, in 1 feeds)
- length (Attribute, in 1329 feeds)
- li (Element, in 1 feeds)
- link (Element, in 1359 feeds)
- liveItems (Attribute, in 1 feeds)
- logo (Element, in 1 feeds)
- longitude (Element, in 1 feeds)
- managingEditor (Element, in 364 feeds)
- managingeditor (Element, in 5 feeds)
- medium (Attribute, in 96 feeds)
- meta (Element, in 2 feeds)
- method (Attribute, in 3 feeds)
- name (Attribute, in 44 feeds)
- owner (Attribute, in 8 feeds)
- p (Element, in 3 feeds)
- pandora (Attribute, in 1 feeds)
- path (Attribute, in 7 feeds)
- port (Attribute, in 7 feeds)
- position (Element, in 1 feeds)
- pre (Element, in 1 feeds)
- protocol (Attribute, in 7 feeds)
- pubDate (Element, in 1355 feeds)
- public (Attribute, in 1 feeds)
- region (Attribute, in 1 feeds)
- registerProcedure (Attribute, in 7 feeds)
- rel (Attribute, in 1189 feeds)
- relationship (Attribute, in 7 feeds)
- role (Attribute, in 247 feeds)
- rss (Element, in 1360 feeds)
- scheme (Attribute, in 242 feeds)
- size (Attribute, in 1 feeds)
- skipDays (Element, in 1 feeds)
- skipHours (Element, in 2 feeds)
- slug (Attribute, in 1 feeds)
- source (Element, in 12 feeds)
- split (Attribute, in 3 feeds)
- spotify (Attribute, in 8 feeds)
- src (Attribute, in 48 feeds)
- start (Attribute, in 187 feeds)
- status (Attribute, in 12 feeds)
- stitcher (Attribute, in 9 feeds)
- strike (Element, in 1 feeds)
- strong (Element, in 1 feeds)
- suggested (Attribute, in 3 feeds)
- systemId (Attribute, in 38 feeds)
- text (Attribute, in 1301 feeds)
- title (Element, in 1360 feeds)
- title (Attribute, in 332 feeds)
- toPubDate (Element, in 5 feeds)
- ttl (Element, in 249 feeds)
- tunein (Attribute, in 9 feeds)
- tv (Attribute, in 21 feeds)
- type (Attribute, in 1357 feeds)
- typicalDuration (Attribute, in 1 feeds)
- ul (Element, in 1 feeds)
- uri (Attribute, in 191 feeds)
- url (Element, in 1115 feeds)
- url (Attribute, in 1344 feeds)
- version (Attribute, in 1360 feeds)
- webMaster (Element, in 201 feeds)
- webmaster (Element, in 14 feeds)
- width (Element, in 172 feeds)
- width (Attribute, in 48 feeds)
* https://www.spotify.com/ns/rss (in 1 feeds)
- countryOfOrigin (Element, in 1 feeds)
* https://www.w3.org/2005/Atom (in 3 feeds)
- link (Element, in 3 feeds)
* https://www.w3.org/TR/REC-xml/#syntax (in 2 feeds)
(Usage numbers for the https://www.rssboard.org/rss-specification
namespace have to be taken with a grain of salt here, because failures are also recorded as "using" this namespace for now. Every element/attribute without a namespace is assigned to this NS)
Also interesting are the prefixes that are declared for additional namespace:
Atom http://www.w3.org/2005/Atom
acast https://schema.acast.com/1.0/
admin http://webns.net/mvcb/
amp http://www.adobe.com/amp/1.0
anchor https://anchor.fm/xmlns
ard http://www.ard.de/ardNamespace
art19 https://art19.com/xmlns/rss-extensions/1.0
atom http://www.w3.org/2005/Atom
atom http://www.w3.org/2005/Atom/
atom https://www.w3.org/2005/Atom
atom10 http://www.w3.org/2005/Atom
audioboom https://audioboom.com/rss/1.0
bitlove http://bitlove.org
blogChannel http://backend.userland.com/blogChannelModule
blogger http://schemas.google.com/blogger/2008
cba https://cba.fro.at/help#feeds
cc http://web.resource.org/cc/
content http://purl.org/rss/1.0/modules/content/
content https://www.w3.org/TR/REC-xml/#syntax
content https://purl.org/rss/1.0/modules/content/
content http://purl.org/rss/1.0/modules/content
creativeCommons http://backend.userland.com/creativeCommonsRssModule
dc http://purl.org/dc/elements/1.1/
dcterms http://purl.org/dc/terms/
feedburner http://rssnamespace.org/feedburner/ext/1.0
feedpress https://feed.press/xmlns
fh http://purl.org/syndication/history/1.0
fireside http://fireside.fm/modules/rss/fireside
foaf http://xmlns.com/foaf/0.1/
frn http://www.freie-radios.net/namespaces/frn
fyyd https://fyyd.de/fyyd-ns/
gd http://schemas.google.com/g/2005
geo http://www.w3.org/2003/01/geo/wgs84_pos#
georss http://www.georss.org/georss
googleplay http://www.google.com/schemas/play-podcasts/1.0
googleplay http://www.google.com/schemas/play-podcasts/1.0/
googleplay https://www.google.com/schemas/play-podcasts/1.0
googleplay http://www.google.com/schemas/play-podcasts/1.0/play-podcasts.xsd
itunes http://www.itunes.com/dtds/podcast-1.0.dtd
itunes http://www.itunes.com/DTDs/Podcast-1.0.dtd
itunes https://www.itunes.com/dtds/podcast-1.0.dtd
itunesu http://www.itunesu.com/feed
iweb http://www.apple.com/iweb
jwplayer http://developer.longtailvideo.com/
media http://search.yahoo.com/mrss/
media http://www.rssboard.org/media-rss
media http://search.yahoo.com/mrss
npr https://www.npr.org/rss/
nprml https://api.npr.org/nprml
og http://ogp.me/ns#
omny https://omny.fm/rss-extensions
openSearch http://a9.com/-/spec/opensearchrss/1.0/
pa http://podcastaddict.com
pingback https://podping.info/specification/1
podaccess https://schema-access.acast.com/1.0/
podaccess https://access.acast.com/schema/1.0/
podcast https://podcastindex.org/namespace/1.0
podcast https://github.com/Podcastindex-org/podcast-namespace/blob/main/docs/1.0.md
podcastRF http://radiofrance.fr/Lancelot/Podcast#
ppg http://bbc.co.uk/2009/01/ppgRss
psc http://podlove.org/simple-chapters
psc https://podlove.org/simple-chapters/
psc https://podlove.org/simple-chapters
rawvoice http://www.rawvoice.com/rawvoiceRssModule/
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
related http://vemedio.com/dtds/atom/related-1.0.dtd
sc http://podlove.org/simple-chapters
schema http://schema.org/
sioc http://rdfs.org/sioc/ns#
sioct http://rdfs.org/sioc/types#
skos http://www.w3.org/2004/02/skos/core#
slash http://purl.org/rss/1.0/modules/slash/
spotify http://www.spotify.com/ns/rss
spotify https://www.spotify.com/ns/rss
sy http://purl.org/rss/1.0/modules/syndication/
taxo http://purl.org/rss/1.0/modules/taxonomy/
thr http://purl.org/syndication/thread/1.0
trackback http://madskills.com/public/xml/rss/module/trackback/
wfw http://wellformedweb.org/CommentAPI/
xhtml http://www.w3.org/1999/xhtml
xmlns http://www.w3.org/2005/Atom
xmlns com-wordpress:feed-additions:1
xmlns http://pipes.yahoo.com
xmlns https://podlove.de/simple-chapters
xsd http://www.w3.org/2001/XMLSchema#
xsi http://www.w3.org/2001/XMLSchema-instance
It's sad to see that major namespaces are used with a wrong URI (Atom, iTunes, Google Play, RSS 1.0 Content, Podlove Simple Chapters, Media RSS).
I'm thinking if we should do something about this. We already have the capability to recognise several URIs for a namespace, so technically we could add the wrong ones and Stalla could parse these elements as well. On write, we'd use the correct namespace then. This would "fix" broken feeds, but mess with our transparent parse/write policy of course, making it a bad idea I guess.
Alternatively this could be a new feature like #46ModelValidator
but for feeds (FeedValidator
)?
Adding to this, I have also found someone else who did a similar analysis a few years back:
https://github.com/mdewilde/podcast-parser/blob/master/corpus-stats
That Java lib also supports a bunch of the namespaces we don't already, it may be worth taking a look at what they deemed worth supporting in terms of DC:
Full list of their supported NS/attributes here
It's sad to see that major namespaces are used with a wrong URI (Atom, iTunes, Google Play, RSS 1.0 Content, Podlove Simple Chapters, Media RSS).
I'm thinking if we should do something about this. We already have the capability to recognise several URIs for a namespace, so technically we could add the wrong ones and Stalla could parse these elements as well. On write, we'd use the correct namespace then. This would "fix" broken feeds, but mess with our transparent parse/write policy of course, making it a bad idea I guess.
Alternatively this could be a new feature like #46
ModelValidator
but for feeds (FeedValidator
)?
I would like to maintain the transparency by default. Maybe we could have a special version of parse
which takes in some options, including things like "attempt to repair namespaces"? Maybe even taking in a parsing pre-processor, which can manipulate the feed DOM before it's parsed. This way it could be relatively easy to inspect and fix namespaces.
I would like to maintain the transparency by default. Maybe we could have a special version of parse which takes in some options, including things like "attempt to repair namespaces"? Maybe even taking in a parsing pre-processor, which can manipulate the feed DOM before it's parsed. This way it could be relatively easy to inspect and fix namespaces.
I like this idea
Adding to this, I have also found someone else who did a similar analysis a few years back: https://github.com/mdewilde/podcast-parser/blob/master/corpus-stats That Java lib also supports a bunch of the namespaces we don't already, it may be worth taking a look at what they deemed worth supporting in terms of DC
Interesting. They've found some namespace I haven't encountered yet, but their data set is also much larger. Will try to get more our of the queried directories in the future.
Adding to this, I have also found someone else who did a similar analysis a few years back: https://github.com/mdewilde/podcast-parser/blob/master/corpus-stats That Java lib also supports a bunch of the namespaces we don't already, it may be worth taking a look at what they deemed worth supporting in terms of DC
Interesting. They've found some namespace I haven't encountered yet, but their data set is also much larger. Will try to get more our of the queried directories in the future.
It was just a lucky coincidence I was looking at this as you opened this issue ahahah
By the way, the author of that lib seems to have a few interesting repos we may look at. Mostly, for this issue, a Java application for finding podcast feed URLs: https://github.com/mdewilde/podcastfinder
Oh boy, there is a problem in the recording I think. All attributes are either assigned to the http://www.w3.org/2000/xmlns/
or the https://www.rssboard.org/rss-specification
namespace. Damn it...
Will add channel/item distinction as well.
Reworked the scrapper a bit and here are some new results. This time I used this gist with ~64k unique feeds scrapped from iTunes. Pre-filtered them yesterday and just used the ones actually reachable (~28k). Results are too large to post them here as text, so I'm appending the various output formats in an archive: 20210415_014430.zip
XHTML is now ignored if declared correctly to improve readability, but tons of feeds are just kaput, making the result rather hard to read for a large input set.
Still need to give the podcastfinder tool a try.
Used the podcastfinder and added the produced feeds to the previous list I had. New results are based on 45550 successfully processed feeds. Full results are here: 20210420_024820.zip (including more and improved output formats).
I'll post more detailed observations in the respective issues of the namespace in the next few days.
For now, here is a list of namespaces that are declared in at least 0,5% of all processed feeds:
96,6% http://www.itunes.com/dtds/podcast-1.0.dtd
85,6% http://www.w3.org/2005/Atom
54,5% http://purl.org/rss/1.0/modules/content/
39,8% http://search.yahoo.com/mrss/
37,8% http://purl.org/dc/elements/1.1/
27,8% http://www.google.com/schemas/play-podcasts/1.0
24,6% http://wellformedweb.org/CommentAPI/
20,4% http://rssnamespace.org/feedburner/ext/1.0
12,9% http://purl.org/rss/1.0/modules/syndication/
12,0% http://purl.org/rss/1.0/modules/slash/
9,0% http://purl.org/dc/terms/
8,8% https://podcastindex.org/namespace/1.0
7,7% http://www.w3.org/1999/02/22-rdf-syntax-ns#
7,1% http://web.resource.org/cc/
7,0% https://anchor.fm/xmlns
6,1% http://www.rawvoice.com/rawvoiceRssModule/
5,5% http://www.georss.org/georss
5,1% http://www.w3.org/2003/01/geo/wgs84_pos#
4,3% http://backend.userland.com/creativeCommonsRssModule
4,2% http://a9.com/-/spec/opensearchrss/1.0/
3,9% http://purl.org/syndication/thread/1.0
3,4% http://www.spotify.com/ns/rss
3,3% https://github.com/Podcastindex-org/podcast-namespace/blob/main/docs/1.0.md
3,0% https://schema.acast.com/1.0/
2,3% http://www.w3.org/XML/1998/namespace
1,9% com-wordpress:feed-additions:1
1,7% http://www.itunes.com/DTDs/Podcast-1.0.dtd
1,6% https://podlove.org/simple-chapters/
1,5% http://schemas.google.com/blogger/2008
1,5% http://schemas.google.com/g/2005
1,5% https://omny.fm/rss-extensions
1,4% http://podlove.org/simple-chapters
1,2% https://podping.info/specification/1
1,2% https://schema-access.acast.com/1.0/
1,2% http://purl.org/syndication/history/1.0
0,9% http://bbc.co.uk/2009/01/ppgRss
0,7% https://art19.com/xmlns/rss-extensions/1.0
0,7% http://www.google.com/schemas/play-podcasts/1.0/
0,5% http://www.rssboard.org/media-rss
Namespaces we do not yet support yet and that have at least one element/attribute declared in the <channel>
or an <item>
of at least one feed (ordered by namespace frequency):
http://search.yahoo.com/mrss/
http://purl.org/dc/elements/1.1/
http://wellformedweb.org/CommentAPI/
http://rssnamespace.org/feedburner/ext/1.0
http://purl.org/rss/1.0/modules/syndication/
http://purl.org/rss/1.0/modules/slash/
http://purl.org/dc/terms/
http://www.w3.org/1999/02/22-rdf-syntax-ns#
https://anchor.fm/xmlns
http://www.rawvoice.com/rawvoiceRssModule/
http://www.georss.org/georss
http://www.w3.org/2003/01/geo/wgs84_pos#
http://backend.userland.com/creativeCommonsRssModule
http://a9.com/-/spec/opensearchrss/1.0/
http://purl.org/syndication/thread/1.0
http://www.spotify.com/ns/rss
https://schema.acast.com/1.0/
http://www.w3.org/XML/1998/namespace
com-wordpress:feed-additions:1
http://schemas.google.com/blogger/2008
http://schemas.google.com/g/2005
https://omny.fm/rss-extensions
https://podping.info/specification/1
https://schema-access.acast.com/1.0/
http://bbc.co.uk/2009/01/ppgRss
But the actual element/attribute usage of these namespaces is <0,1%
http://www.w3.org/1999/02/22-rdf-syntax-ns#
http://schemas.google.com/blogger/2008
http://schemas.google.com/g/2005
https://schema-access.acast.com/1.0/
Some general observations:
7.1% of feeds declare http://web.resource.org/cc/
(for license info) but not a single feed uses it in any element/attribute.
1,7% of feeds use the wrong declaration http://www.itunes.com/DTDs/Podcast-1.0.dtd
for iTunes.
1,4% of feeds use the correct Podlove Simple Chapter namespace declaration http://podlove.org/simple-chapters
, while 1,6% use the wrong https://podlove.org/simple-chapters/
. This is pretty terrible.
0,7% of feeds use the wrong declaration http://www.google.com/schemas/play-podcasts/1.0/
for Google Play.
0,5% use the use the wrong declaration http://www.rssboard.org/media-rss
for Media RSS.
Yes, the Spotify tags also surprised me.
Note however, that these results are now extremely US/English feeds centered, because the podcastfinder
has these settings hardcoded...
I've noticed that for smaller datasets (based on "large" Fyyd/Panoptikum results that have way more german content), the namespace frequency looks quite different for namespaces that are in the 1,X% range in these results (e.g. Podlove Simple Chapters and Feed History are very high because in the German speaking area the Podlove Publisher CMS is extremely popular).
At some point I'll integrate the internal API of podcastfinder
into the scrapper to have better backing data. For now I think it's worth to also pay some attention to the namespaces that are further down in our results, and check if there are some useful specifications hidden in there (e.g. #84)
What's still left to do on this one? Just the podcastfinder API integration?
Right now there is on the table:
I'm also not fully done with studying the result data yet, and unfortunately I won't be able to make much time for another 2-3 weeks.
Do you wanna have access to the repo @rock3r? If you wan't to pick this up, or have something additional to add.
Sure, although I'll be focussing on getting 1.1.0 out the door first :)
I've added you to the repo :)
As discussed in #28, statistical information about namespaces (elements/attributes) usage in feeds will help us to determine what we should support in the future.
This issue is for result posting and discussion.