Closed tiagom101 closed 6 years ago
Hi @Gank that seems like normal output - some pages are just malformed and the scraper chokes out. That's fine and scraping shouldn't stop.
Are you seeing it stop after this error message? It should just carry on scraping.
@sergiotapia it continues the normal crawling, but the URL seems strange,
https://isohunt.to http://www.bitlord.com /share/?r...
Is this expected?
That's not expected, might be isohunt's VERY aggressive ads. I'll need to tweak the parser to ignore these types of links.
We're not crawling IsoHunt anymore and their current website doen't look finished. Thanks for the issue though :)
Hi,
I'm getting the following error using the last version of Magnetissimo,
Crawling: https://isohunt.tohttp://www.bitlord.com/share/?re=IsoHunt.to&ba=0E3B6B&co=fff&sh=HEYZO-1213-美癡女-淫亂熟女誘惑-甲斐美晴-無碼中文字 幕&ur=https://isohunt.to//torrent_details/17047072/HEYZO-1213-%E7%BE%8E%E7%99%A1%E5%A5%B3-%E6%B7%AB%E4%BA%82%E7%86%9F%E5%A5%B3%E8%AA%98%E6%83%91-%E7%94%B2%E6%96%90%E7%BE%8E%E6%99%B4-%E7%84%A1%E7%A2%BC%E4%B8%AD%E6%96%87%E5%AD%97%E5%B9%95 Error: https://isohunt.tohttp://www.bitlord.com/share/?re=IsoHunt.to&ba=0E3B6B&co=fff&sh=HEYZO-1213-美癡女-淫亂熟女誘惑-甲斐美晴-無碼中文字幕& ur=https://isohunt.to//torrent_details/17047072/HEYZO-1213-%E7%BE%8E%E7%99%A1%E5%A5%B3-%E6%B7%AB%E4%BA%82%E7%86%9F%E5%A5%B3%E8%AA%98%E6%83%91- %E7%94%B2%E6%96%90%E7%BE%8E%E6%99%B4-%E7%84%A1%E7%A2%BC%E4%B8%AD%E6%96%87%E5%AD%97%E5%B9%95 just ain't workin. 19:58:25.599 [error] Process #PID<0.17204.10> raised an exception ** (FunctionClauseError) no function clause matching in Floki.Finder.traverse/4 lib/floki/finder.ex:49: Floki.Finder.traverse(nil, [], %Floki.Selector{attributes: [], classes: ["torrent-header"], combinator: nil, id: n il, type: "h1"}, []) lib/floki/finder.ex:61: Floki.Finder.traverse/4 lib/floki/finder.ex:35: Floki.Finder.find_selectors/2 (magnetissimo) lib/parsers/isohunt.ex:46: Magnetissimo.Parsers.Isohunt.scrape_torrent_information/1 (magnetissimo) lib/download_worker.ex:138: Magnetissimo.DownloadWorker.perform/3 (exq) lib/exq/worker/server.ex:119: anonymous fn/3 in Exq.Worker.Server.dispatch_work/2