Open mpgirro opened 3 years ago
How do we collect data about the tags that are used, and we need to support? There's 15 in the specs, that's a lot :)
For starters, we can stick to the gPodder recommendation: https://github.com/gpodder/podcast-feed-best-practice/blob/master/podcast-feed-best-practice.md
The questions which tags are actually used is a good one though. Some time ago I had the idea of writing a little tool that just reads a lot of feeds and makes a statistic about the used namespaces and tags per namespace. This could be pretty useful, also for other namespaces we might not have considered yet.
I was thinking about a little scraper as well, we could maybe point it at some podcast charts and collect them. If you're up for that, having an idea of which ones are actually used may give us a priority list for the implementation
According to the GPodder document, the DC 1.1 elements we should support are:
<channel> |
<item> |
---|---|
|
The tags for <item>
s are a subset of the ones applied to <channel>
(you don't have dc:language
and dc:date
)
On top of that, there may be a few DC terms items which we may want to support, but I don't really know which ones are even used. Probably the scraper can help us there.
Thanks for doing the analysis of the gPodder document. I agree that the scraper would be really useful regarding the terms. Hope I can make some time the next few days and give it a try.
Update on the scraper: Have rigged up a prototype. Should be easy to extend from this point on to get all the info we want. Expect some first results in the next few days.
Great to know! Thanks for tackling this :)
Analysis results show the following DC elements occurrences:
<channel> |
<item> |
<image> |
---|---|---|
creator (0,3%) | creator (27,7%) | creator (1 feed) |
date (0,2%) | date (0,3%) | date (1 feed) |
rights (0,1%) | rights (< 0,1%) | |
language (0,1%) | language (< 0,1%) | language (1 feed) |
title (< 0,1%) | title (< 0,1%) | title (1 feed) |
subject (< 0,1%) | subject (0,1%) | subject (1 feed) |
description (< 0,1%) | description (< 0,1%) | description (1 feed) |
contributor (< 0,1%) | contributor (< 0,1%) | |
publisher (< 0,1%) | publisher (< 0,1%) | publisher (1 feed) |
coverage (< 0,1%) | ||
type (< 0,1%) | type (< 0,1%) | |
format (< 0,1%) | format (1 feed) | |
date.Taken (< 0,1%) | ||
modified (< 0,1%) | ||
identifier (< 0,1%) | ||
contributor (< 0,1%) | ||
source (< 0,1%) |
I guess we can ignore the one <image>
occurrence 😄
dc:creator
seems to be the most used element from what I can remember. More research needed