jagalile / ivoox-scraping

Scrapes ivoox to download podcasts
GNU General Public License v3.0
3 stars 2 forks source link

Download unknown episode name/all episodes #5

Open adocampo opened 1 year ago

adocampo commented 1 year ago

I'm testing your software to download those "ivoox originals" which podcasts client cannot download completely (damned ivoox force to enter their website to download the complete mp3).

So far is working quite well, I have just a "but", and it is I need to know the episode name, so I finally need to go to ivoox and copypaste the name to pass it to the script.

I was wondering if there is on the roadmap to let to download just the latest episode/all episodes, so we could use a cron job to download just the latest one, whatever its name is, or download all the episodes for the very first time without the need of copy-pasting its name one by one.

jagalile commented 1 year ago

Thank you for your interest in this repository.

The functionality was limited to my personal use that's why it was necessary to pass the name of the episode.

I had forgotten the development but seeing that someone else besides me uses it, I will implement the two features you suggest.

adocampo commented 1 year ago

Glad to hear that! :D

Muchas gracias!

jagalile commented 1 year ago

Added the option to download the latest podcast episode in the last release

adocampo commented 1 year ago

-latest works sweetly, now I'm able to script it and grab latest episodes from my Ivoox Originals favorite podcasts, and listem them on my podcast client!! Awesome!!! :D

The other function works fine, but it only returns the latest coincidence (i.e.: if you put a search term with more than one result, it will download just the newer one), it will be nice to let choose which one to download or download any coincidence.

Also, I would fine extra-useful an option to grab all episodes, so, if we found an interesting podcast whose RSS returns just the latest N episodes, but on the ivoox platform has all of them, you would be able to download them in a breeze.

What do you think?

adocampo commented 1 year ago

Oh, just noticed this just downloads from the very first page of results, if I try to download anything from the second page onwards, the result is always:

Searching episode...
Traceback (most recent call last):
  File "/home/malevolent/development/python/ivoox-scraping/src/download_podcast.py", line 50, in search_episode
    episode_element_in_podcast_page = self.web_scraping.find_element_by_partial_text(
  File "/home/malevolent/development/python/ivoox-scraping/src/web_scraper.py", line 58, in find_element_by_partial_text
    return self.driver.find_element(By.PARTIAL_LINK_TEXT, chapter_search_name)
  File "/home/malevolent/development/python/ivoox-scraping/env/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 857, in find_element
    return self.execute(Command.FIND_ELEMENT, {
  File "/home/malevolent/development/python/ivoox-scraping/env/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
    self.error_handler.check_response(response)
  File "/home/malevolent/development/python/ivoox-scraping/env/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"partial link text","selector":"La Hora Oscura T06E17: "Chikatilo:el demonio de Rostov""}
  (Session info: headless chrome=110.0.5481.96)
Stacktrace:
#0 0x55e2be8cc303 <unknown>
#1 0x55e2be6a0d37 <unknown>
#2 0x55e2be6dd5b2 <unknown>
#3 0x55e2be6dd6c1 <unknown>
#4 0x55e2be717b34 <unknown>
#5 0x55e2be6fd9ad <unknown>
#6 0x55e2be71588c <unknown>
#7 0x55e2be6fd753 <unknown>
#8 0x55e2be6d0a14 <unknown>
#9 0x55e2be6d1b7e <unknown>
#10 0x55e2be91b32e <unknown>
#11 0x55e2be91ec0e <unknown>
#12 0x55e2be901610 <unknown>
#13 0x55e2be91fc23 <unknown>
#14 0x55e2be8f3545 <unknown>
#15 0x55e2be9406a8 <unknown>
#16 0x55e2be940836 <unknown>
#17 0x55e2be95bd13 <unknown>
#18 0x7f630e073bb5 <unknown>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/malevolent/development/python/ivoox-scraping/main.py", line 43, in <module>
    main(args.p, args.e, args.latest)
  File "/home/malevolent/development/python/ivoox-scraping/main.py", line 41, in main
    dp.download_episode()
  File "/home/malevolent/development/python/ivoox-scraping/src/download_podcast.py", line 30, in download_episode
    episode_element_in_podcast_page = self.search_episode()
  File "/home/malevolent/development/python/ivoox-scraping/src/download_podcast.py", line 57, in search_episode
    next_page = self.web_scraping.find_element_by_xpath(
  File "/home/malevolent/development/python/ivoox-scraping/src/web_scraper.py", line 61, in find_element_by_xpath
    return self.driver.find_element(By.XPATH, xpath)
  File "/home/malevolent/development/python/ivoox-scraping/env/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 857, in find_element
    return self.execute(Command.FIND_ELEMENT, {
  File "/home/malevolent/development/python/ivoox-scraping/env/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
    self.error_handler.check_response(response)
  File "/home/malevolent/development/python/ivoox-scraping/env/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="main"]/div/div[4]/div/nav/ul/li[12]/a"}
  (Session info: headless chrome=110.0.5481.96)
Stacktrace:
#0 0x55e2be8cc303 <unknown>
#1 0x55e2be6a0d37 <unknown>
#2 0x55e2be6dd5b2 <unknown>
#3 0x55e2be6dd6c1 <unknown>
#4 0x55e2be717b34 <unknown>
#5 0x55e2be6fd9ad <unknown>
#6 0x55e2be71588c <unknown>
#7 0x55e2be6fd753 <unknown>
#8 0x55e2be6d0a14 <unknown>
#9 0x55e2be6d1b7e <unknown>
#10 0x55e2be91b32e <unknown>
#11 0x55e2be91ec0e <unknown>
#12 0x55e2be901610 <unknown>
#13 0x55e2be91fc23 <unknown>
#14 0x55e2be8f3545 <unknown>
#15 0x55e2be9406a8 <unknown>
#16 0x55e2be940836 <unknown>
#17 0x55e2be95bd13 <unknown>
#18 0x7f630e073bb5 <unknown>

So perhaps my last comment is even harder to implement.