kba / rssscrpr

Scrape web content to RSS feeds
https://rssscrpr.herokuapp.com/
MIT License
1 stars 2 forks source link

Can we handle html redirection? #16

Closed zuphilip closed 8 years ago

zuphilip commented 8 years ago

For example this RSS feed works by using the url http://edoc.hu-berlin.de/browsing/series/index.php?l[2]=Einrichtungen&l[3]=+Institut+f%C3%BCr+Bibliotheks-+und+Informationswissenschaft&c[3][corp_id]=1005140&l[4]=Berliner+Handreichungen+zur+Bibliotheks-+und+Informationswissenschaft+-&c[4][series_id]=29309&_=a2eab3eda1ae2dfb5a515aa93f7ef9e3. But if we instead a shorter url http://0cn.de/tbux which redirects to the actual page the following error occurs:

Failed to retrieve 'http://0cn.de/tbux': {
    "url": "http:\/\/0cn.de\/tbux",
    "content_type": "text\/html; charset=UTF-8",
    "http_code": 302,
    "header_size": 528,
    "request_size": 303,
    "filetime": -1,
    "ssl_verify_result": 0,
    "redirect_count": 0,
    "total_time": 1.052315,
    "namelookup_time": 0.252984,
    "connect_time": 0.342316,
    "pretransfer_time": 0.342762,
    "size_upload": 0,
    "size_download": 0,
    "speed_download": 0,
    "speed_upload": 0,
    "download_content_length": 0,
    "upload_content_length": 0,
    "starttransfer_time": 1.051267,
    "redirect_time": 0,
    "redirect_url": "http:\/\/edoc.hu-berlin.de\/browsing\/series\/index.php?l[2]=Einrichtungen&l[3]=+Institut+f\u00fcr+Bibliotheks-+und+Informationswissenschaft&c[3][corp_id]=1005140&l[4]=Berliner+Handreichungen+zur+Bibliotheks-+und+Informationswissenschaft+-&c[4][series_id]=29309&_=a2eab3eda1ae2dfb5a515aa93f7ef9e3",
    "primary_ip": "5.199.142.96",
    "certinfo": [],
    "primary_port": 80,
    "local_ip": "172.19.7.218",
    "local_port": 41420
}
kba commented 8 years ago

Should work now:

zuphilip commented 8 years ago

Thank you! Yes, it works!