unoconv / unoserver

MIT License
552 stars 77 forks source link

Fail to convert some doc files to pdf files #2

Closed kartikeyporwal closed 2 years ago

kartikeyporwal commented 3 years ago

Hi, thanks for developing and sharing this project.

I recently came across an issue while converting a .doc file to a .pdf file (unfortunately, I can't share doc file due to IP reasons).

The error was: RuntimeError: The input document is of an unknown document type. This is probably a bug.

Though I'm not familiar with the OpenOffice APIs, when I checked document.SupportedServiceNames for that particular file, the result is ('com.sun.star.document.OfficeDocument', 'com.sun.star.text.GenericTextDocument', 'com.sun.star.text.WebDocument'). Neither of these are listed in DocTypes. When I added com.sun.star.document.OfficeDocument in DocTypes, the new error states RuntimeError: Could not find an export filter from com.sun.star.document.OfficeDocument to pdf_Portable_Document_Format.

Now when I checked the DocumentService of export_filter, there is no entry for com.sun.star.document.OfficeDocument, and com.sun.star.text.GenericTextDocument but com.sun.star.text.WebDocument, so I replaced com.sun.star.document.OfficeDocument in DocTypes with com.sun.star.text.WebDocument but later found com.sun.star.text.WebDocument is deprecated.

Fortunately, It worked.

But my concern is, can this project be ported to support any kind of docs (.doc, .docx, .odt, .rtf, etc.) conversion to pdf? Though the changes I made worked for the documents I have but I am afraid as

  1. com.sun.star.text.WebDocument is deprecated
  2. There might come some other type of .doc file which can fail during pdf cnversion.

I guess some change in export_filters query might include DocumentService for other DocTypes.

Thanks!

regebro commented 2 years ago

That's strange, why is that doc file not supported as a TextDocument? Is it not a word document?

kartikeyporwal commented 2 years ago

Yes, I later found out that it was not the original word format but com.sun.star.text.WebDocument that is a html type of format with .doc extension.

regebro commented 2 years ago

I see. I'll see if I can find some example of that and test it, it's possible that adding com.sun.star.text.WebDocument isn't a problem.

regebro commented 2 years ago

I can't find any examples of this document format. However, the export filter list does list a "com.sun.star.text.WebDocument" to "pdf_Portable_Document_Format" output filter, so it should be possible to convert it.

regebro commented 2 years ago

I think the changes made in the next release will solve this issue.

regebro commented 2 years ago

1.2 released