j0k3r / graby

Graby helps you extract article content from web pages
MIT License
363 stars 73 forks source link

extract more content #219

Closed enmohsinali closed 4 years ago

enmohsinali commented 4 years ago

Hi First of all thank you to provide such a brilliant Grabber. I want to get og meta tags and include them into the array same like you have add authors. how can I do that?

Thank you

j0k3r commented 4 years ago

It's already done: https://github.com/j0k3r/graby/blob/master/src/Extractor/ContentExtractor.php#L1175-L1246

Do you have an example where this doesn't work?

enmohsinali commented 4 years ago

Thank you for your response.

How can I include them in return array?

j0k3r commented 4 years ago

It’s already done. You don’t need to add them again in the return array.

enmohsinali commented 4 years ago

For each and every link I am getting just these indexes

array (
  'status' => 200
  'html' => "Fetched and readable content"
  'title' => "Ben E King: R&B legend dies at 76"
  'language' => "en"
  'date' => "2015-05-01T16:24:37+01:00"
  'authors' => array(
    "BBC News"
  )
  'url' => "http://www.bbc.com/news/entertainment-arts-32547474"
  'image' => "https://ichef-1.bbci.co.uk/news/720/media/images/82709000/jpg/_82709878_146366806.jpg"
  'summary' => "Ben E King received an award from the Songwriters Hall of Fame in …"
  'native_ad' => false
  'headers' => array (
    'server' => 'Apache'
    'content-type' => 'text/html; charset=utf-8'
    'x-news-data-centre' => 'cwwtf'
    'content-language' => 'en'
    'x-pal-host' => 'pal074.back.live.cwwtf.local:80'
    'x-news-cache-id' => '13648'
    'content-length' => '157341'
    'date' => 'Sat, 29 Apr 2017 07:35:39 GMT'
    'connection' => 'keep-alive'
    'cache-control' => 'private, max-age=60, stale-while-revalidate'
    'x-cache-action' => 'MISS'
    'x-cache-age' => '0'
    'x-lb-nocache' => 'true'
    'vary' => 'X-CDN,X-BBC-Edge-Cache,Accept-Encoding'
  )
)

I want to have an index called og_tags with all the og tags extracted .

enmohsinali commented 4 years ago

My point is to include og tags in this array https://github.com/j0k3r/graby/blob/master/src/Graby.php#L435

tcitworld commented 4 years ago

Content that's in the og tags is already present in the information provided by Graby. If you want only the og tags just use something like the get_meta_tags function or libraries like https://github.com/fusonic/opengraph