vaites / php-apache-tika

Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
MIT License
114 stars 22 forks source link

is there any methods for formatted text ...apache tika php 7.2 #12

Closed Jagadhissh closed 5 years ago

Jagadhissh commented 5 years ago

hi sir iam unable to view converted word file into separate formated text ..as like of our word document pls tell me pls develop the text formating methods

vaites commented 5 years ago

@Jagadhissh, I do not understand what you mean. Are you trying to get HTML from a Word document?. I mean, with bold, italics, links...?

Jagadhissh commented 5 years ago

Thanks for immediate response . Iam trying to get the word document without disturbing the Ms word file format

vaites commented 5 years ago

I think I can't do this. I just ask Tika for the text extraction and return it, can't do anything else.

You can test it by yourself: just download the tika-app-1.20.jar, open it and then drag the Word file. You will see what Apache Tika extracts. I can't format it according to original format because PHP doesn't have access to it...

If I'm understanding your request, I think the best way to do this is using a document conversion tool to convert (not extract) to HTML format. You can do this with a headless OpenOffice installation...

Jagadhissh commented 5 years ago

k thank you sir.