manolomartinez / greg

A command-line podcast aggregator
GNU General Public License v3.0
299 stars 37 forks source link

More robust last updated date check, more strict sanitize function, two new placeholders. #14

Closed FilipBB closed 11 years ago

FilipBB commented 11 years ago

1.add check for entries[0].published_parsed and set sync_by_date() True if found 2.in sanitize(): convert unicode to ascii to prevent accented letters in filename causing problems 3.added 'entrysummary' and 'subtitle' to placeholders. These are descriptions of the entry and the feed, useful for adding to mp3 comment fields to describe the podcast

manolomartinez commented 11 years ago

Received, thanks. Will review it shortly. One thing that we will need to do is turn every non-standard-library module into an optional dependency (like we do with Stagger). Anyway, as I say, in a day or so I'll give more detailed comments.

FilipBB commented 11 years ago

Sure, it should be easy to do. The only two places BeautifulSoup is used is in the sanitize() function where I left your original code commented out, and in the Placeholders class where I used it to sanitize the subtitle field. They could probably just be consolidated into the sanitize function and a check for BeautifulSoup availability could decide which route the sanitize function should take. I'll try to look at it tonight.

FilipBB commented 11 years ago

I added a new function htmltotext() which checks if the beautifulsoup module is available and uses it to convert html to text if it is. If not, it just passes the html through unchanged. I also added it as an optional dependency in the git pkgbuild file, let me know how that works for you,

Filip

manolomartinez commented 11 years ago

This works. In particular, the more robust updated-time check is a clear, obvious improvement. Thanks a lot for taking the time to improve the code. I've changed a couple of things here and there -- check the commits after yours.

FilipBB commented 11 years ago

Great,

I just found something else which might be a bug:

when using 'greg download' after 'greg check' it was saving all the files using localtime. I noticed because I use the {date} placeholder to save filenames. sync() uses the published date, so I looked into the download() function and it looks like it just sets entry.linkdate to the localtime without checking for entry.published_parsed or entry.updated_parsed. sync() does check for those, so I basically just copied the snippet from sync() to download() and it seems to be setting {date} correctly now. If there was another reason for setting entry.linkdate to localtime that I'm overlooking, please let me know, thanks,

-Filip

manolomartinez commented 11 years ago

That seems indeed to be a bug. Would you mind opening a new issue, so that I can keep track of that? The ideal thing to do would be to take the snippet out of sync(), into its own function, and then call it both from sync() and download()