manolomartinez / greg

A command-line podcast aggregator
GNU General Public License v3.0
297 stars 37 forks source link

sync with filter doesn't download anything... #37

Closed mfeif closed 8 years ago

mfeif commented 8 years ago

I'm assuming that "sync" is supposed to download new entries, and that manual downloading isn't necessary (but the readme isn't clear to me).

I want to grab an audio podcast that publishes episodes and then all the components of those episodes as separate entries (unfortunately). But fortunately, the titles of the latter always include "Episode". So I've been able to filter on that.

But doing a greg sync tnyrh doesn't do anything, even though there is new material shown by greg list, and greg download 0-9 does grab new stuff named "Episode".

What am I doing wrong, or is there a bug maybe between sync and filter?

manolomartinez commented 8 years ago

Hello, could you please provide the sequence of commands you have used, including the url of the feed in question?

mfeif commented 8 years ago

Sure,

The url is http://feeds.wnyc.org/newyorkerradiohour, which I have named tnyrh

I did the initial sync a few days ago, and it downloaded the 3-4 back episodes. I had a cron job set to run yesterday, which is the day of the week they publish new shows. SO, my crontab says to:

greg sync tnyrh

There aren't any failure messages in syslog (or any output from greg at all, though cron may not capture non-error messages). Cron DID mail me: a simple message that says:

Checking tnyrh...
Done

Here is the relevant part of the conf:

[tnyrh]
# The New Yorker Radio Hour puts EVERYTHING in their feed...
Tag = no
filter = "Episode" in "{title}"

If I run greg check --feed tnyrh I get:

0: Robert Glasper's Jazz Heresy (Fri, 11 Dec 2015 16:44:06 -0500)
1: Stand Clear of the Closing Doors (Fri, 11 Dec 2015 16:44:05 -0500)
2: A Wave of Sexual Violence on TV (Fri, 11 Dec 2015 16:42:19 -0500)
3: The Missing Boater (Fri, 11 Dec 2015 16:41:53 -0500)
4: Episode 8: The Missing Boater, and Robert Glasper (Fri, 11 Dec 2015 00:00:00 -0500)
5: The Mayor and the Mormon Church (Fri, 04 Dec 2015 15:38:52 -0500)
6: Roger Angell on Writing and Love (Fri, 04 Dec 2015 15:37:51 -0500)
7: Two Picks for the Week: "Please Like Me" and Elena Ferrante (Fri, 04 Dec 2015 15:37:44 -0500)
8: Let's Get Drinks (Fri, 04 Dec 2015 04:00:00 -0500)
9: A High School Mock Election Keeps it Real (Fri, 04 Dec 2015 00:00:00 -0500)
  (etc)

I did go in and manually force the download with greg download 0-9 and it worked.

manolomartinez commented 8 years ago

Hi, Matt, thanks for the info.

I did the initial sync a few days ago, and it downloaded the 3-4 back episodes. I had a cron job set to run yesterday, which is the day of the week they publish new shows. SO, my crontab says to:

In that first sync, did the filter work correctly?

greg sync tnyrh

There aren't any failure messages in syslog (or any output from greg at all, though cron may not capture non-error messages). Cron DID mail me: a simple message that says:

Checking tnyrh...
Done

That is what happens when greg has nothing to download. I've just replicated your setup, and set the --downloadfrom date to 2015-12-07 and greg sync has downloaded Episode 8 and nothing else, as expected. Is it at all possible that the cron job started before that episode was posted?

Manolo

mfeif commented 8 years ago

In the first sync, the filter DID work correctly.

It's certainly possible that the cron-job was before the episode was posted, but when I went in and looked to see if the script worked, and it hadn't, running greg sync tnyrh would therefore have picked it up then, right?

manolomartinez commented 8 years ago

In the first sync, the filter DID work correctly.

Good :)

It's certainly possible that the cron-job was before the episode was posted, but when I went in and looked to see if the script worked, and it hadn't, running greg sync tnyrh would therefore have picked it up then, right?

Yeah, it should, if it had already been posted by then. I assume that you have performed syncs later and they still have not picked it out?

Anyway, please post the file ~/.local/share/greg/tnyrh. It should be there.

mfeif commented 8 years ago

Interesting! There is no file in ~/.local/share/greg/data corresponding to tnyrh; there are the other podcasts I've subscribed to.

mfeif commented 8 years ago

There's a pickle file called feeddump in there, unpickling it gives me a list with two items; it appears to be a parsed representation of the feed for tnyrh; no other feeds have data in there.

manolomartinez commented 8 years ago

Interesting! There is no file in ~/.local/share/greg/data corresponding to tnyrh; there are the other podcasts I've subscribed to.

Yeah, that's weird. You didn't remove the feed, add it back again later for some reason. I know I do this kind of things...

Anyway, have you kept on syncing this feed? How's it doing?

There's a pickle file called feeddump in there, unpickling it gives me a list with two items; it appears to be a parsed representation of the feed for tnyrh; no other feeds have data in there.

That comes from greg check. I leave it there, in case you want to keep on doing greg download based on it. Unfortunately it's irrelevant to the problem at hand.

mfeif commented 8 years ago

I don't recall removing the feed, but maybe in the course of testing something I did a manual rm.

Anyway, the show is weekly, so one hasn't come out yet ;-)

I'll know tomorrow evening. Thanks

manolomartinez commented 8 years ago

One final check before tomorrow evening, then :) What does greg info trynh say?

mfeif commented 8 years ago

greg info tnyrh


tnyrh
-----
    url: http://feeds.wnyc.org/newyorkerradiohour
manolomartinez commented 8 years ago

Right, greg has forgotten everything about this feed, and it's treating it like a totally new one. Perhaps there is a bug in the interaction of the firstsync default option in greg.conf (which is one entry) and a filter. I'll look into it, thanks.

mfeif commented 8 years ago

Ok, it sync'd again this evening... no change. There is no file in .local for tnyrh. Sync says "no new episodes" and there have been no downloads.

greg check --feed tnyrh

shows new stuff:

0: Sofia Coppola Gives Bill Murray for Christmas (Fri, 18 Dec 2015 17:00:23 -0500)
1: Claudia Rankine's Poetry Reveals the Harm in Microaggressions (Fri, 18 Dec 2015 17:00:17 -0500)
2: Mark Singer on Donald Trump's Comeback (Fri, 18 Dec 2015 17:00:05 -0500)
3: My Living Will (Fri, 18 Dec 2015 16:59:56 -0500)
4: The Drone Under the Tree (Fri, 18 Dec 2015 16:59:55 -0500)
5: Episode 9: Christmas Skies Full of Drones, and Donald Trump's Ultimate Luxury (Fri, 18 Dec 2015 00:00:00 -0500)
6: Robert Glasper's Jazz Heresy (Fri, 11 Dec 2015 16:44:06 -0500)
7: Stand Clear of the Closing Doors (Fri, 11 Dec 2015 16:44:05 -0500)
8: A Wave of Sexual Violence on TV (Fri, 11 Dec 2015 16:42:19 -0500)
9: The Missing Boater (Fri, 11 Dec 2015 16:41:53 -0500)

I'm not going to manually download anything, in case this "broken" state can be leveraged to help track down the bug.

manolomartinez commented 8 years ago

OK, I now see what's going on: in a first sync, greg's default is to download just one entry. Well, that is what is intended, but, in fact, what it does is put just one entry in the download queue, which is then filltered.

So, in the unlucky situation that you are downloading for the first time (which was, and we do know know why this is, the state of your feed) and you are filtering and the first entry encountered does not pass the filter, then it will download nothing. This is the wrong behavior.

I will try to push a fix later today. Thanks for helping me notice this.

Manolo

manolomartinez commented 8 years ago

I think this is now fixed. Please update greg, and let's try again?

Manolo

mfeif commented 8 years ago

It does appear to pick up those old missing episodes. Thanks!

mfeif commented 8 years ago

New bug, though... :-(

https://github.com/manolomartinez/greg/issues/39