Closed magthe closed 10 years ago
Can you share your custom CSL file, so I can try to reproduce the problem?
+++ Magnus Therning [Sep 10 14 00:06 ]:
After upgrading to 0.5 I've observed a very strange issue. Processing of several of my files resulted in an error like this:
pandoc -t latex --filter pandoc-citeproc --template template.latex --csl=style.csl -o lfs_system_utp.pdf lfs_system_utp_t.mkd pandoc-citeproc: error while parsing the XML string pandoc: Error running filter pandoc-citeproc
I simply stopped using the (slightly custom) CSL file I want to use and instead fell back on the default one that comes with pandoc-citeproc. That worked, and was all right for the moment.
After a few days I saw a message on a Haskell-related mailing list for the Arch Linux distro regarding this. That mail described a work-around: just replace the default CSL file with the one you want to use. Indeed, that works:
cp style.csl /usr/share/x86_64-linux-ghc-7.8.3/pandoc-citeproc-0.5/chicago-author-date.csl pandoc -t latex --filter pandoc-citeproc --template template.latex -o lfs_system_utp.pdf lfs_system_utp_t.mkd
Clearly there is something going on here that is really surprising to a mere user.
Reply to this email directly or view it on GitHub: https://github.com/jgm/pandoc-citeproc/issues/81
It's here: https://gist.github.com/magthe/4c45ed79f245f6712755
Just to be clear though, copying the default CSL to the local directory, and then using the --csl
argument to pandoc also results in the error message from above. So I'd be very surprised if it really is an XML parsing problem.
+++ Magnus Therning [Sep 10 14 12:36 ]:
It's here: https://gist.github.com/magthe/4c45ed79f245f6712755
Just to be clear though, copying the default CSL to the local directory, and then using the
--csl
argument to pandoc also results in the error message from above. So I'd be very surprised if it really is an XML parsing problem.
Oh, thanks. That's a good clue.
I can't reproduce this. Did you install using cabal, or in some other way? If via cabal, can you send the output of ghc-pkg list
?
I install using the distro package manager. Since I also maintain the packages involved I know that the output of ghc-pkg list
reflects the build environment used.
Pandoc and pandoc-citeproc are built with the following flags:
pandoc 1.13.0.1-3 (-make-pandoc-man-pages https -trypandoc -embed_data_files)
pandoc-citeproc 0.4.0.1-3 (-test_citeproc -unicode_collation -embed_data_files -hexpat bibutils small_base)
This is the output of ghc-pkg list
after installing pandoc-citeproc on a clean system:
/usr/lib/ghc-7.8.3/package.conf.d:
Cabal-1.18.1.3
HTTP-4000.2.18
JuicyPixels-3.1.7.1
SHA-1.6.4.1
aeson-0.7.0.6
aeson-pretty-0.7.1
array-0.5.0.0
asn1-encoding-0.8.1.3
asn1-parse-0.8.1
asn1-types-0.2.3
attoparsec-0.11.3.4
base-4.7.0.1
base64-bytestring-1.0.0.1
bin-package-db-0.0.0.0
binary-0.7.1.0
blaze-builder-0.3.3.2
blaze-html-0.7.0.2
blaze-markup-0.6.1.0
rts-1.0
byteable-0.1.1
bytestring-0.10.4.0
case-insensitive-1.2.0.0
cereal-0.4.0.1
cipher-aes-0.2.8
cipher-des-0.0.6
cipher-rc4-0.1.4
cmdargs-0.10.9
conduit-1.2.0.2
connection-0.2.3
containers-0.5.5.1
cookie-0.4.1.3
cprng-aes-0.5.2
crypto-cipher-types-0.0.9
crypto-numbers-0.2.3
crypto-pubkey-0.2.4
crypto-pubkey-types-0.4.2.2
crypto-random-0.0.8
cryptohash-0.11.6
data-default-0.5.3
data-default-class-0.0.1
data-default-instances-base-0.0.1
data-default-instances-containers-0.0.1
data-default-instances-dlist-0.0.1
data-default-instances-old-locale-0.0.1
deepseq-1.3.0.2
deepseq-generics-0.1.1.1
digest-0.0.1.2
directory-1.2.1.0
dlist-0.7.1
exceptions-0.6.1
extensible-exceptions-0.1.1.4
filepath-1.3.0.2
(ghc-7.8.3)
ghc-prim-0.3.1.0
haddock-library-1.1.1
hashable-1.2.2.0
haskeline-0.7.1.2
(haskell2010-1.1.2.0)
(haskell98-2.0.0.3)
highlighting-kate-0.5.9
hoopl-3.10.0.1
hpc-0.6.0.1
hs-bibutils-5.0
hslua-0.3.13
http-client-0.3.8.2
http-client-tls-0.2.2
http-types-0.8.5
integer-gmp-0.5.1.0
lifted-base-0.2.3.0
mime-types-0.1.0.4
mmap-0.5.9
mmorph-1.0.4
monad-control-0.3.3.0
mtl-2.1.3.1
nats-0.2
network-2.5.0.0
old-locale-1.0.0.6
old-time-1.1.0.2
pandoc-1.13.1
pandoc-citeproc-0.5
pandoc-types-1.12.4.1
parsec-3.1.5
pem-0.2.2
pretty-1.1.1.1
primitive-0.5.3.0
process-1.2.0.0
publicsuffixlist-0.1
random-1.0.1.3
regex-base-0.93.2
regex-pcre-builtin-0.94.4.8.8.35
resourcet-1.1.2.3
rfc5051-0.1.0.3
scientific-0.3.3.0
securemem-0.1.3
semigroups-0.15.2
socks-0.5.4
split-0.2.2
stm-2.4.3
streaming-commons-0.1.4.2
syb-0.4.2
tagsoup-0.13.2
template-haskell-2.9.0.0
temporary-1.2.0.3
terminfo-0.4.0.0
texmath-0.8
text-1.1.1.3
time-1.4.2
tls-1.2.9
transformers-0.3.0.0
transformers-base-0.4.3
unix-2.7.0.1
unordered-containers-0.2.5.0
utf8-string-0.3.8
vector-0.10.11.0
void-0.6.1
x509-1.4.12
x509-store-1.4.4
x509-system-1.4.5
x509-validation-1.5.0
xhtml-3000.2.1
xml-1.3.13
yaml-0.8.9.1
zip-archive-0.2.3.4
zlib-0.5.4.1
The -hexpat stands out as a non-default flag that would be different from my setup. Is there a reason you don't use hexpat? It is much faster. It may be that the non-hexpat configuration is now broken.
+++ Magnus Therning [Sep 11 14 03:24 ]:
I install using the distro package manager. Since I also maintain the packages involved I know that the output of
ghc-pkg list
reflects the build environment used.Pandoc and pandoc-citeproc are built with the following flags:
pandoc 1.13.0.1-3 (-make-pandoc-man-pages https -trypandoc -embed_data_files) pandoc-citeproc 0.4.0.1-3 (-test_citeproc -unicode_collation -embed_data_files -hexpat bibutils small_base)
This is the output of
ghc-pkg list
after installing pandoc-citeproc on a clean system:/usr/lib/ghc-7.8.3/package.conf.d: Cabal-1.18.1.3 HTTP-4000.2.18 JuicyPixels-3.1.7.1 SHA-1.6.4.1 aeson-0.7.0.6 aeson-pretty-0.7.1 array-0.5.0.0 asn1-encoding-0.8.1.3 asn1-parse-0.8.1 asn1-types-0.2.3 attoparsec-0.11.3.4 base-4.7.0.1 base64-bytestring-1.0.0.1 bin-package-db-0.0.0.0 binary-0.7.1.0 blaze-builder-0.3.3.2 blaze-html-0.7.0.2 blaze-markup-0.6.1.0 rts-1.0 byteable-0.1.1 bytestring-0.10.4.0 case-insensitive-1.2.0.0 cereal-0.4.0.1 cipher-aes-0.2.8 cipher-des-0.0.6 cipher-rc4-0.1.4 cmdargs-0.10.9 conduit-1.2.0.2 connection-0.2.3 containers-0.5.5.1 cookie-0.4.1.3 cprng-aes-0.5.2 crypto-cipher-types-0.0.9 crypto-numbers-0.2.3 crypto-pubkey-0.2.4 crypto-pubkey-types-0.4.2.2 crypto-random-0.0.8 cryptohash-0.11.6 data-default-0.5.3 data-default-class-0.0.1 data-default-instances-base-0.0.1 data-default-instances-containers-0.0.1 data-default-instances-dlist-0.0.1 data-default-instances-old-locale-0.0.1 deepseq-1.3.0.2 deepseq-generics-0.1.1.1 digest-0.0.1.2 directory-1.2.1.0 dlist-0.7.1 exceptions-0.6.1 extensible-exceptions-0.1.1.4 filepath-1.3.0.2 (ghc-7.8.3) ghc-prim-0.3.1.0 haddock-library-1.1.1 hashable-1.2.2.0 haskeline-0.7.1.2 (haskell2010-1.1.2.0) (haskell98-2.0.0.3) highlighting-kate-0.5.9 hoopl-3.10.0.1 hpc-0.6.0.1 hs-bibutils-5.0 hslua-0.3.13 http-client-0.3.8.2 http-client-tls-0.2.2 http-types-0.8.5 integer-gmp-0.5.1.0 lifted-base-0.2.3.0 mime-types-0.1.0.4 mmap-0.5.9 mmorph-1.0.4 monad-control-0.3.3.0 mtl-2.1.3.1 nats-0.2 network-2.5.0.0 old-locale-1.0.0.6 old-time-1.1.0.2 pandoc-1.13.1 pandoc-citeproc-0.5 pandoc-types-1.12.4.1 parsec-3.1.5 pem-0.2.2 pretty-1.1.1.1 primitive-0.5.3.0 process-1.2.0.0 publicsuffixlist-0.1 random-1.0.1.3 regex-base-0.93.2 regex-pcre-builtin-0.94.4.8.8.35 resourcet-1.1.2.3 rfc5051-0.1.0.3 scientific-0.3.3.0 securemem-0.1.3 semigroups-0.15.2 socks-0.5.4 split-0.2.2 stm-2.4.3 streaming-commons-0.1.4.2 syb-0.4.2 tagsoup-0.13.2 template-haskell-2.9.0.0 temporary-1.2.0.3 terminfo-0.4.0.0 texmath-0.8 text-1.1.1.3 time-1.4.2 tls-1.2.9 transformers-0.3.0.0 transformers-base-0.4.3 unix-2.7.0.1 unordered-containers-0.2.5.0 utf8-string-0.3.8 vector-0.10.11.0 void-0.6.1 x509-1.4.12 x509-store-1.4.4 x509-system-1.4.5 x509-validation-1.5.0 xhtml-3000.2.1 xml-1.3.13 yaml-0.8.9.1 zip-archive-0.2.3.4 zlib-0.5.4.1
Reply to this email directly or view it on GitHub: https://github.com/jgm/pandoc-citeproc/issues/81#issuecomment-55245857
Well, hexpat
isn't in our repo and since the dependencies can be satisfied without it that's what happens. Anyway, I modified the flag and pulled in hexpat
and now it works fine. So indeed, it seems the non-hexpat
XML parsing is broken.
I've just replaced the old xml-light and hexpat based CSL parsers with a new, xml-conduit-based one (pure Haskell). It is about twice as fast as the old hexpat based parser in my tests, and will be much easier to maintain and extend. This should solve this issue once it is released.
I still have this issue with pandoc-citeproc 0.5 on Fedora 22. Is there a fix for this situation? I suppose I'd have to build pandoc-citeproc myself to get the most recent version or wait until fedora puts it into their repository?
The workaround to replace the default .csl works, but it's obviously not a very practical solution.
I still have this issue with pandoc-citeproc 0.5 on Fedora 22. Is there a fix for this situation? I suppose I'd have to build pandoc-citeproc myself to get the most recent version or wait until fedora puts it into their repository?
@nylki, there is a copr repository with pandoc statically linked from Jens Petersen (https://copr.fedoraproject.org/coprs/petersen/pandoc/).
I have just asked him whether he could add the latest version from pandoc-citeproc.
@ousia thanks! have you got a response from Jens Peter?
@nylki, you have a subpackage at https://copr.fedoraproject.org/coprs/petersen/pandoc/ (only for Fedora 22 or newer).
After upgrading to 0.5 I've observed a very strange issue. Processing of several of my files resulted in an error like this:
I simply stopped using the (slightly custom) CSL file I want to use and instead fell back on the default one that comes with pandoc-citeproc. That worked, and was all right for the moment.
After a few days I saw a message on a Haskell-related mailing list for the Arch Linux distro regarding this. That mail described a work-around: just replace the default CSL file with the one you want to use. Indeed, that works:
Clearly there is something going on here that is really surprising to a mere user.