Need to extract only html without dom object.

mgufrone / pdf-to-html

PDF to HTML PHP Class using Poppler-Utils

MIT License

175 stars 88 forks source link

Need to extract only html without dom object. #8

Closed kartikkapatel closed 8 years ago

kartikkapatel commented 8 years ago

First of all thank You for such wonderful library.
it was very help full to me and i also just need the only html from $html = $pdf->html(); object.

mgufrone commented 8 years ago

Do you mean it should return html string, instead of dom object?

kartikkapatel commented 8 years ago

No it's return dom object but i need only content of that page , do u have any function that help to get only direct string?? if u have any wiki of the library then please share with us. thank you

fengkaijia commented 8 years ago

So, is there any way to return the HTML string? My main project use DomCrawler from Symfony so it would be nice if I can just get and pass the html string to my current parser.

mgufrone commented 8 years ago

Sorry for late response. I am unable to continue this project for my own work. I will continue to improve this package as soon as possible. thanks for the suggestion. 👍

mgufrone commented 8 years ago

$pdf->html() will now return as html string. if you want to get the dom object, just use $pdf->getDom();

please test it on your own. I already tested it, but I would like you guys to test this too.

fengkaijia commented 8 years ago

@mgufrone Tested and it's working now without any issue :clap:, although the html() method still calls the getDom() to extract HTML which might cost some performance when dealing with a large PDF.

mgufrone commented 8 years ago

great. I'm still working on it so it will directly return as html.