failure to convert from odt

kyoxiao / pandoc

Automatically exported from code.google.com/p/pandoc

GNU General Public License v2.0

0 stars 0 forks source link

failure to convert from odt #112

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?
1.  create a simple odt document.
2. try to convert from odt to html
 pandoc -o foo.html foo.odt

What is the expected output? What do you see instead?
CPU usage maxes out and the conversion never finishes.

What version of the product are you using? On what operating system?
pandoc 1.1 -citeproc -highlighting
linux debain
ghc 6.10
open office 3

Please provide any additional information below.

Original issue reported on code.google.com by eeg...@gmail.com on 1 Dec 2008 at 5:09

GoogleCodeExporter commented 8 years ago

As it says in the documentation, pandoc can read markdown, html, restructured 
text,
and latex.  odt is an output format but not (yet) an input format.  If you had
explicitly specified '--from odt', you would have gotten an error.  Since you 
didn't
explicitly specify a reader, and '.odt' doesn't correspond to any of pandoc's 
input
formats, pandoc treated the file as markdown.  This didn't lead to good results,
because .odt isn't even a text format.

Bottom line:  this doesn't seem to be a bug.

Original comment by fiddloso...@gmail.com on 1 Dec 2008 at 6:40

Changed state: Invalid

GoogleCodeExporter commented 8 years ago

Setting this to invalid is stating that using 100% of cpu indefintely is a valid
response to the input conditions.  So it would be good to keep this as a valid 
bug,
even if it is a low priority.

Original comment by eeg...@gmail.com on 3 Dec 2008 at 5:19

GoogleCodeExporter commented 8 years ago

You're right.  I'll reclassify this as a low-priority bug.
I suspect that the problem is that the odt input isn't broken up by blank lines,
which normally form an endpoint at which a search can stop.  (This is a 
backtracking
parser.)  So, for example, if there's a '*' character early in the input, 
pandoc may
have to search all the way to the end of the document looking for a match.  If 
that's
what's happening, I can't think of an easy solution.  (One possible fix would 
be just
to give an error if .odt, .doc, .pdf, or other binary files are given as 
inputs.)

Original comment by fiddloso...@gmail.com on 3 Dec 2008 at 8:41

Changed state: Accepted
Added labels: Priority-Low
Removed labels: Priority-Medium