sergiotapia / magnetissimo

Web application that indexes all popular torrent sites, and saves it to the local database.
MIT License
2.99k stars 187 forks source link

Error crawling isohunt #19

Closed tiagom101 closed 6 years ago

tiagom101 commented 8 years ago

Hi,

I'm getting the following error using the last version of Magnetissimo,

Crawling: https://isohunt.tohttp://www.bitlord.com/share/?re=IsoHunt.to&ba=0E3B6B&co=fff&sh=HEYZO-1213-美癡女-淫亂熟女誘惑-甲斐美晴-無碼中文字 幕&ur=https://isohunt.to//torrent_details/17047072/HEYZO-1213-%E7%BE%8E%E7%99%A1%E5%A5%B3-%E6%B7%AB%E4%BA%82%E7%86%9F%E5%A5%B3%E8%AA%98%E6%83%91-%E7%94%B2%E6%96%90%E7%BE%8E%E6%99%B4-%E7%84%A1%E7%A2%BC%E4%B8%AD%E6%96%87%E5%AD%97%E5%B9%95 Error: https://isohunt.tohttp://www.bitlord.com/share/?re=IsoHunt.to&ba=0E3B6B&co=fff&sh=HEYZO-1213-美癡女-淫亂熟女誘惑-甲斐美晴-無碼中文字幕& ur=https://isohunt.to//torrent_details/17047072/HEYZO-1213-%E7%BE%8E%E7%99%A1%E5%A5%B3-%E6%B7%AB%E4%BA%82%E7%86%9F%E5%A5%B3%E8%AA%98%E6%83%91- %E7%94%B2%E6%96%90%E7%BE%8E%E6%99%B4-%E7%84%A1%E7%A2%BC%E4%B8%AD%E6%96%87%E5%AD%97%E5%B9%95 just ain't workin. 19:58:25.599 [error] Process #PID<0.17204.10> raised an exception ** (FunctionClauseError) no function clause matching in Floki.Finder.traverse/4 lib/floki/finder.ex:49: Floki.Finder.traverse(nil, [], %Floki.Selector{attributes: [], classes: ["torrent-header"], combinator: nil, id: n il, type: "h1"}, []) lib/floki/finder.ex:61: Floki.Finder.traverse/4 lib/floki/finder.ex:35: Floki.Finder.find_selectors/2 (magnetissimo) lib/parsers/isohunt.ex:46: Magnetissimo.Parsers.Isohunt.scrape_torrent_information/1 (magnetissimo) lib/download_worker.ex:138: Magnetissimo.DownloadWorker.perform/3 (exq) lib/exq/worker/server.ex:119: anonymous fn/3 in Exq.Worker.Server.dispatch_work/2

sergiotapia commented 8 years ago

Hi @Gank that seems like normal output - some pages are just malformed and the scraper chokes out. That's fine and scraping shouldn't stop.

Are you seeing it stop after this error message? It should just carry on scraping.

tiagom101 commented 8 years ago

@sergiotapia it continues the normal crawling, but the URL seems strange,

https://isohunt.to http://www.bitlord.com /share/?r...

Is this expected?

sergiotapia commented 8 years ago

That's not expected, might be isohunt's VERY aggressive ads. I'll need to tweak the parser to ignore these types of links.

tchoutri commented 6 years ago

We're not crawling IsoHunt anymore and their current website doen't look finished. Thanks for the issue though :)