miaowware / ctyparser

A CTY.DAT parser for modern amateur radio programs
https://ctyparser.miaow.io/
MIT License
4 stars 2 forks source link

Extracting the download link from the feed #10

Open 0x5c opened 4 years ago

0x5c commented 4 years ago

So it isn't hard-coded anymore

classabbyamp commented 4 years ago

not sure how to not hard-code this. the download link isn't included in the RSS feed.

<item>
    <title>Big CTY &#x2013; 20 November 2019</title>
    <link>
        http://www.country-files.com/big-cty-20-november-2019/
    </link>
    <pubDate>Wed, 20 Nov 2019 15:35:19 +0000</pubDate>
    <creator>
        <![CDATA[ AD1C ]]>
    </creator>
    <category>
        <![CDATA[ Big CTY ]]>
    </category>
    <guid isPermaLink="false">https://www.country-files.com/?p=1359</guid>
    <description>
        <![CDATA[
        Version entity is East Malaysia, 9M6 [download] Added/changed Entities/Prefixes/Callsigns: 3DA0BP/J is Kingdom of eSwatini, 3DA A41HA/ND is Oman, A4 B7/BA7CK, B7/BA7NQ, B7/BD1TX, B7CRA, BA7CK, BA7IA, BD1TX and BD7HC are all China, BY in CQ zone 26 VP8HAL is Antarctica, CE9 &#8230; <a href="http://www.country-files.com/big-cty-20-november-2019/">Continue reading <span class="meta-nav">&#8594;</span></a>
        ]]>
    </description>
</item>
mbridak commented 1 year ago

You can extract the link via:

"""Get URL to new bigcty file"""

import feedparser
import requests
from lxml import html

DEFAULT_FEED = "http://www.country-files.com/category/big-cty/feed/"

feed = requests.get(DEFAULT_FEED, timeout=15)
parsed_feed = feedparser.parse(feed.content)
update_url = parsed_feed.entries[0]["link"]

page = requests.get(update_url, timeout=15)
tree = html.fromstring(page.content)
link = tree.xpath("//a[contains(@href,'zip')]/@href")[0]

print(link)

which today spits out: https://www.country-files.com/bigcty/download/2023/bigcty-20230526.zip

classabbyamp commented 1 year ago

nice! if you'd like to make a PR for this, feel free :)