Closed Jagadhissh closed 5 years ago
@Jagadhissh, I do not understand what you mean. Are you trying to get HTML from a Word document?. I mean, with bold, italics, links...?
Thanks for immediate response . Iam trying to get the word document without disturbing the Ms word file format
I think I can't do this. I just ask Tika for the text extraction and return it, can't do anything else.
You can test it by yourself: just download the tika-app-1.20.jar, open it and then drag the Word file. You will see what Apache Tika extracts. I can't format it according to original format because PHP doesn't have access to it...
If I'm understanding your request, I think the best way to do this is using a document conversion tool to convert (not extract) to HTML format. You can do this with a headless OpenOffice installation...
k thank you sir.
hi sir iam unable to view converted word file into separate formated text ..as like of our word document pls tell me pls develop the text formating methods