j0k3r / graby

Graby helps you extract article content from web pages
MIT License
368 stars 74 forks source link

Can it extract product specifications from webpage ? #146

Closed ajay01994 closed 6 years ago

ajay01994 commented 6 years ago

HI there, Nice code for article extraction ,i have a question as can it extract product specifications fom web page like given in in the this -http://lunadong.com/publication/dexter_vldb.pdf article .or could you share some thoughts in building it

Thanks

j0k3r commented 6 years ago

graby extract text from an url. If that url is a pdf file, it use an external library to extract text from it. It's hard to extract data from that text, because it is not structured anymore (the text from a pdf doesn't have markup).

I've no idea how you can proceed. You should better check for pdf extraction information instead of graby to achieve what you want.