Closed faximan closed 3 weeks ago
I worked around the archive.is specific issue by using the trick in https://stackoverflow.com/questions/11680709/file-get-contents-give-me-403-forbidden.
if (file_exists($autoload))
{
require $autoload;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$html = curl_exec($ch);
curl_close($ch);
}
And then remove the second
$html = file_get_contents($url);
further below (seems like a bug?)
Now I get the full text content.
Proposed some changes in https://github.com/thefranke/rss-librarian/pull/2.
Hey @faximan, saw your posts just now! Thank you very much, I'll take a look at it shortly!
Did a quick test and it seems to work fine, thank you for the fixes.
I plan to add one more step when creating a first feed so that users do not easily create and abandon feeds. Let me know if you have any additional ideas as well.
Hey there, I stumbled upon this project when also looking to self-host an RSS feed of random articles that I want to read in NetNewsWire. Love it!
It works great, the only minor problem is that I haven't found a way deal with articles like https://www.outsideonline.com/culture/essays-culture/instagram-travel-influencers-yosemite/. When extracting this URL, you only get the beginning of the article (same in things like built-in browser "reader modes"), not the full content.
Previously, I have solved this by going through e.g. archive.is to get a URL I can throw into Instapaper (https://archive.is/IRYN9) but this URL is completely failing in rss-librarian. I get
[unable to retrieve full-text content]
. Maybe this is a FiveFilters issue?Anyways, I was curious if you have any strategy to deal with such URLs. Another one (properly paywalled) would be https://www.nytimes.com/2024/09/13/technology/elon-musk-security.html.