Closed MaienM closed 2 years ago
I like the idea of putting the old behaviours into the defaults for this flag. This would even allow people to exclude the current defaults if they so desire (with a simple !*
), which is pretty neat, and I'm always for more flexibility. There is one minor problem with this though: the descriptionText
field in the current episode metadata comes from the contentSnippet
field from the feed. All other fields in the metadata match the names from the feed itself, but that one does not.
We could fix this by adding some remapping functionality, so that a rule like contentSnippet as descriptionText
or something similar would include the field under a different name. I wouldn't mind adding this, but I worry about the added complexity. Wouldn't want to turn this into something noone uses! Then again, being able to get exactly what you want in the structure you want it in does sound quite appealing. And it'd be an optional thing, so it wouldn't complicate the "normal" rules at all.
Aside from that, how do you envision the options working? Just adding some default rules wouldn't take away the need for the checks, since you could still add rules without enabling the core include-meta feature. Would you want to merge the options? Have specifying globs implicitly turn on the include-meta feature? Something else?
There is one minor problem with this though: the
descriptionText
field in the current episode metadata comes from thecontentSnippet
field from the feed. All other fields in the metadata match the names from the feed itself, but that one does not.
Oh! Good call. I'm totally up for switching it over to contentSnippet
. I'll just make sure the mark the release correctly β I think it's worth it.
Aside from that, how do you envision the options working? Just adding some default rules wouldn't take away the need for the checks, since you could still add rules without enabling the core include-meta feature. Would you want to merge the options? Have specifying globs implicitly turn on the include-meta feature? Something else?
Similar to --archive
, we can have --include-meta
and --include-episode-meta
optionally include arguments. If passed in without an argument, use the default selectors for the fields as before. And if passed in with field selectors, use those for grabbing the meta fields. What do you think? π
Oh! Good call. I'm totally up for switching it over to
contentSnippet
. I'll just make sure the mark the release correctly β I think it's worth it.
:+1:
Similar to
--archive
, we can have--include-meta
and--include-episode-meta
optionally include arguments. If passed in without an argument, use the default selectors for the fields as before. And if passed in with field selectors, use those for grabbing the meta fields. What do you think? smile
Sure, that works. If you pass a field selector, would you want to add to the defaults or override them? That is, should --include-episode-meta foo
result in ['title', 'contentSnippet', 'pubDate', 'creator', 'foo']
or ['foo']
? (Mind that if we go for the former you could still use --include-episode-meta '!*' --include-episode-meta foo
to only get foo
if that is what you want.)
I think option 2 makes the most sense to me: --include-episode-meta foo
resulting in ['foo']
.
Getting more and more excited to have this feature added. πͺ
Should be functional now! No documentation yet, but I'll let you play around with it and see if you like it/manage to break it before we worry about that.
While preparing examples for the readme I noticed that the module used to parse the feed doesn't actually include all fields. Furthermore, for some of the fields that it does include it does so with a transformation of some kind (e.g. itunes:category
results in itunes.categories
and itunes.categoriesWithSubs
).
This means that more work is needed to actually include all metadata. It also means that if we were to merge this as it currently is we're exposing these transformations the underlying module makes, which would make changing to another way of acquiring the full metadata a breaking change.
I have some ideas for potential solutions that I'll look into. I'll change this back to a draft for now until we have a solution of some sort.
I did some more work on this. I hooked into the parser to get access to the raw data from the feed while still allowing us to be mostly ignorant about the type of feed (Atom/RSS 0.9/RSS 1.0/RSS 2.0). This gets us access to the full data, but this is unfortunately not without issues.
The first problem is that the mapping from XML to JSON is... icky. It's an unfortunate but unavoidable artifact of mapping XML to JSON; XML allows a single tag to have attributes, child tags and contents, plus it allows repeating a tag (meaning anything is potentially an array), whereas in JSON each entry in an object can only be a single thing, and arrays are explicit even when they only have one entry. To capture all this complexity in JSON results in a rather verbose structure:
To aleviate this I've added an option to output to XML instead of JSON. This uses the same library that the RSS parsing library uses, so it round trips properly and it's not really a new dependency.
Another issue is that this exposes details of the underlying feed type to the metadata. This is unavoidable once you get into the full details, but it is a shame to lose out on the simple and uniform metadata we had previously.
The way I've handled this is by storing the raw data under the raw
key in the feed/episode metadata. This way you can get access to it if you want to/need something that isn't in the normalized metadata without losing out on the benefits of a uniform format.
These changes add a bit more complexity when compared to the last version, but I've done my best to minimize the complexity while still offering a large degree of flexibility. Overall I'm relatively happy with the result. What do you think?
I think a lot of people are in the same camp as you and won't be interested in raw
. It's not part of the default filters and it wouldn't even be included by a *
rule, but we could certainly put it behind a flag.
We could even go a bit further and split this off into a separate metadata file. This could also somewhat alleviate the JSON mapping issue; we could stick to JSON for the normal metadata and use XML for the raw export. This would allow people like me who would like to have a dump of the raw metadata without burdening the JSON output. We could just keep this as a simple all-or-nothing flag without rules support.
Separately I think the customFields
thing is worth exploring. I had seen this option but had disregarded it thus far since it didn't fit my use-case of wanting everything, but it would be good as an option for those that want specific extra bits of metadata. I'll create a basic implementation of this and then we can see how we like it and further refine it that way.
One concern I have which is worsened by the addition of customFields
is that this is turning into a (potentially) unwieldy amount of CLI arguments. It might be worth considering an option to specify these rules/mappings in a separate file that can be referred to. This should probably be a separate PR, but I am curious what you think. Something similar to mocha.opts
? A config file in JSON format?
Sorry for the long delay β I was AFK for quite a bit!
I think in an effort to not add too much complexity to the project, adding support for XML and additional parsing might best be supported as an additional script someone could run separately. A separate script that fetches the RSS feed and parses out the metadata one might want per episode would be much simpler and easier to adapt over time. I plan on adding a --include-feed
param that will save the raw file into the output directly which then can be targeted by additional scripts.
When archiving with
--include-meta
and--include-episode-meta
some of the metadata of the podcast and its episodes are saved. The set that is currently saved is only a subset of all the available metadata though, and I'd like to increase this.It would be trivial to just include all metadata (probably excluding the items listing for the podcast metadata), and I'd totally be happy with that as a solution, but I wanted to try and make it a bit more flexible.
This draft adds two new flags,
--include-meta-fields
and--include-episode-meta-fields
. As their names suggest these can be used to indicate which extra fields should be included. They can be used multiple times, have simple globbing support (foo*
), and support negation (!foobar
).This does not handle filtering on nested keys yet. I didn't realize that properties like
itunes:subtitle
resulted in a nested structure when making my original design. I can fix this, but I wanted to get some feedback on this before investing more time into it.A few questions from my side:
If so, do you have any feelings regarding how to handle nested keys? I'm leaning towards the following:
itunes: { subtitle: "foo" }
would result in a key ofitunes.subtitle
which would then be checked against the filter.*
to match anything but.
.**
that doest match anything, including.
.This way one could easily include everything by passing
**
, or only non-object properties by passing*
, or perhaps all itunes properties by passingitunes.**
, etc.