Closed FilipBB closed 11 years ago
Received, thanks. Will review it shortly. One thing that we will need to do is turn every non-standard-library module into an optional dependency (like we do with Stagger). Anyway, as I say, in a day or so I'll give more detailed comments.
Sure, it should be easy to do. The only two places BeautifulSoup is used is in the sanitize() function where I left your original code commented out, and in the Placeholders class where I used it to sanitize the subtitle field. They could probably just be consolidated into the sanitize function and a check for BeautifulSoup availability could decide which route the sanitize function should take. I'll try to look at it tonight.
I added a new function htmltotext() which checks if the beautifulsoup module is available and uses it to convert html to text if it is. If not, it just passes the html through unchanged. I also added it as an optional dependency in the git pkgbuild file, let me know how that works for you,
Filip
This works. In particular, the more robust updated-time check is a clear, obvious improvement. Thanks a lot for taking the time to improve the code. I've changed a couple of things here and there -- check the commits after yours.
Great,
I just found something else which might be a bug:
when using 'greg download' after 'greg check' it was saving all the files using localtime. I noticed because I use the {date} placeholder to save filenames. sync() uses the published date, so I looked into the download() function and it looks like it just sets entry.linkdate to the localtime without checking for entry.published_parsed or entry.updated_parsed. sync() does check for those, so I basically just copied the snippet from sync() to download() and it seems to be setting {date} correctly now. If there was another reason for setting entry.linkdate to localtime that I'm overlooking, please let me know, thanks,
-Filip
That seems indeed to be a bug. Would you mind opening a new issue, so that I can keep track of that? The ideal thing to do would be to take the snippet out of sync(), into its own function, and then call it both from sync() and download()
1.add check for entries[0].published_parsed and set sync_by_date() True if found 2.in sanitize(): convert unicode to ascii to prevent accented letters in filename causing problems 3.added 'entrysummary' and 'subtitle' to placeholders. These are descriptions of the entry and the feed, useful for adding to mp3 comment fields to describe the podcast