j0k3r / graby

Graby helps you extract article content from web pages
MIT License
363 stars 73 forks source link

Grabbing audio tags #294

Closed tpeacock19 closed 1 year ago

tpeacock19 commented 2 years ago

I'm having some difficulty in determining how to grab audio tags from certain urls. For example https://www.economist.com/britain/2022/07/06/the-toxicity-of-boris-johnson.

It has a section like this

    <div>
      <figure>
        <div>
          <figcaption>Listen to this story.</figcaption>
          <p>
            <span
              >Enjoy more audio and podcasts on
              <a
                id="audio-ios-cta"
                href="https://economist-app.onelink.me/d2eC/bed1b25"
                target="_blank"
                rel="noreferrer"
                >iOS</a
              >
              or
              <a
                id="audio-android-cta"
                href="https://economist-app.onelink.me/d2eC/7f3c199"
                target="_blank"
                rel="noreferrer"
                >Android</a
              >.</span
            >
          </p>
        </div>
        <audio
          controls
          id="audio-player"
          preload="none"
          src="https://www.economist.com/media-assets/audio/057%20Britain%20-%20Bagehot-ed07ffe6dd1c5867ac96363bfbb41106.mp3"
          title="The toxicity of Boris Johnson"
          controlslist="nodownload"
        >
          <p>Your browser does not support the &lt;audio&gt; element.</p>
        </audio>
        <div>...</div>
      </figure>
    </div>

and it returns:

<div class="css-drxevm e1xre6611">
  <figure>
    <p>
      <figcaption>Listen to this story.</figcaption>
      Enjoy more audio and podcasts on
      <a
        id="audio-ios-cta"
        href="https://economist-app.onelink.me/d2eC/bed1b25"
        target="_blank"
        rel="noreferrer"
        >iOS</a
      >
      or
      <a
        id="audio-android-cta"
        href="https://economist-app.onelink.me/d2eC/7f3c199"
        target="_blank"
        rel="noreferrer"
        >Android</a
      >.
    </p>
    <p>Your browser does not support the &lt;audio&gt; element.</p>
  </figure>
</div>

Is there something I can do to affect this behavior? I'm just using the standard example:

<?php

require 'vendor/autoload.php';
use Graby\Graby;

$url = 'https://www.economist.com/britain/2022/07/06/the-toxicity-of-boris-johnson';
$html = file_get_contents($url);
$graby = new Graby();
$result = $graby->fetchContent($url);
var_dump($result);