oscarotero / Embed

Get info from any web service or page
MIT License
2.08k stars 310 forks source link

Twitter extractor will retrieve "/home" instead of a tweet URL #520

Open Divi opened 1 year ago

Divi commented 1 year ago

Twitter is now a fully logged-on website: you cannot access a tweet without an account.
So, the extractor will try to request the twitter.com/xxx/status/xxx but will follow location on /home (with the login screen) and will attempt to call the oembed API with /home URI.

The only fix that I found to disable this behavior is to disable the "follow redirection" behavior on cURL.

$client = new CurlClient();
$client->setSettings([
    'follow_location' => false
]);

$embed = new Embed(new Crawler($client));

We may use the cookie to inject the auth_token cookie, but I'm not sure the token won't change after a few hours/days.

This may impact other embeds, so if you have a better solution, please let me know!

stevecoug commented 1 year ago

Thank you for the fix, that worked for me as well. I only use that for twitter.com URLs.

helmo commented 5 months ago

Thanks, the 'follow_location' also helped here....

Here's the patch how I added it, being used from the Drupal url_embed module, https://www.drupal.org/project/url_embed/issues/3435840

--- src/Http/Crawler.php.orig   2024-03-27 13:33:31.547671482 +0100
+++ src/Http/Crawler.php        2024-03-27 13:34:14.180154682 +0100
@@ -23,6 +23,9 @@
     public function __construct(ClientInterface $client = null, RequestFactoryInterface $requestFactory = null, UriFactoryInterface $uriFactory = null)
     {
         $this->client = $client ?: new CurlClient();
+        $this->client->setSettings([
+                'follow_location' => false
+        ]);
         $this->requestFactory = $requestFactory ?: FactoryDiscovery::getRequestFactory();
         $this->uriFactory = $uriFactory ?: FactoryDiscovery::getUriFactory();
     }