jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.21k stars 3.3k forks source link

Release amd64 binary: Illegal hardware instruction #8947

Closed freijon closed 11 months ago

freijon commented 12 months ago

Downstream bug (Gentoo): https://bugs.gentoo.org/910183 - Gentoo uses the binary provided in the released tar.gz

I installed pandoc on my VM. When I use the following command, I get the following error:

Command: pandoc --pdf-engine=lualatex -H <preamble-file.tex> <input-file.md> -o <output-file.pdf>

Error:

[1] 19126 illegal hardware instruction

Here some additional information:

Output of resolve-march-native:

-march=nocona -madx -mbmi -mbmi2 -mclflushopt -mfsgsbase -mrdseed -msahf -mxsave -mxsavec -mxsaveopt -mxsaves --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=12288

Versions tried:

After some initial debugging with gdb, I found:

Starting program: /usr/bin/pandoc --pdf-engine=lualatex -H preamble.tex dokumentation_cockpit.md -o dokumentation_cockpit.pdf [New LWP 13421] [New LWP 13422] [New LWP 13423] [New LWP 13424]

Thread 1 "pandoc" received signal SIGILL, Illegal instruction. 0x0000000006413259 in ?? ()

(gdb) x/i $pc => 0x6407fdc: vpxor %xmm5,%xmm5,%xmm5

This indicates that the binary appears to be using AVX which isn't available on all 64-bit x86 CPUs

alerque commented 12 months ago

Color me surprised to learn that Gentoo of all distros is relying on prebuilt upstream binaries for their packaging.

freijon commented 12 months ago

I expected a comment like this :D It's the solution for lazy people. The main package is built from source of course, but there is a binary package for people who don't want to compile and maintain 200+ Haskell packages just to run pandoc ;)

jgm commented 12 months ago

I'd like to be sure that this is coming from pandoc and not lualatex (which will be called given the command you've used). Can you reproduce this using a simpler command (not producing a PDF)? Also, could you try with this command, but with --verbose, which may give us a better indication of where this is occurring?

jgm commented 12 months ago

Possibly relevant: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/1306

jgm commented 12 months ago

I don't know much about this, but it could be that ghc determines dynamically whether the processor it's running on supports AVX, and then uses these instructions if it does. (I'm guessing our build machine does.) I'm not (yet) seeing any way to tell it not to do this.

I haven't seen this reported before: is that because only fairly old machines don't support AVX at this point?

jgm commented 12 months ago

Actually there is a flag for avx (from ghc 9.6 manual):

-m avx (x86 only) These SIMD instructions are currently not supported by the native code generator. Enabling this flag has no effect and is only present for future extensions.

The LLVM backend may use AVX if your processor supports it, but detects this automatically, so no flag is required.

My understanding is that ghc uses the native code generator by default.

freijon commented 12 months ago

I'd like to be sure that this is coming from pandoc and not lualatex (which will be called given the command you've used). Can you reproduce this using a simpler command (not producing a PDF)? Also, could you try with this command, but with --verbose, which may give us a better indication of where this is occurring?

Here are the results:

EDIT:

jgm commented 12 months ago

OK, that's helpful. Does it matter what is in in_file.md? Can it be just one word, for example?

freijon commented 12 months ago

I just tried it with only "test" in in_file.md - same result

jgm commented 12 months ago

@mpickering as a ghc dev I was hoping you might have insight into this?

AndreasPK commented 12 months ago

As far as I'm aware ghcs native backend can't emit this instruction. This means it was likely the result of missguided optimization either in a library, ghcs RTS, or through the llvm backend.

For any more insight we would need to know which ghc version/libraries were used to build this release. A likely culprit seems the text library which recently started using SIMD via C bindings for some functionality.

jgm commented 12 months ago

@AndreasPK thanks for commenting here. I don't have the exact list for that build, but I triggered a new release build and made it emit a cabal freeze. These should be roughly the same versions of packages, as the last release was just last week. text is version 2.0.2. Another place to look is the whole new crypton ecosystem, I suppose, since that is new in the last pandoc release; if the problem lies there, it would explain why I haven't gotten other reports like this. (On the other hand, it could just be that people are using pandoc on relatively recent hardware.)

ghc version: ghc 9.6.2, from Docker image glcr.b-data.ch/ghc/ghc-musl:9.6.2

Wrote freeze file: /tmp/cirrus-ci-build/cabal.project.freeze
active-repositories: hackage.haskell.org:merge
constraints: any.Cabal ==3.10.1.0,
             any.Cabal-syntax ==3.10.1.0,
             any.Diff ==0.4.1,
             any.Glob ==0.10.2,
             any.HUnit ==1.6.2.0,
             any.JuicyPixels ==3.3.8,
             JuicyPixels -mmap,
             any.OneTuple ==0.4.1.1,
             any.Only ==0.1,
             any.QuickCheck ==2.14.3,
             QuickCheck -old-random +templatehaskell,
             any.SHA ==1.6.4.4,
             SHA -exe,
             any.StateVar ==1.2.2,
             any.aeson ==2.1.2.1,
             aeson -cffi +ordered-keymap,
             any.aeson-pretty ==0.8.10,
             aeson-pretty -lib-only,
             any.alex ==3.4.0.0,
             any.ansi-terminal ==1.0,
             ansi-terminal -example,
             any.ansi-terminal-types ==0.11.5,
             any.appar ==0.1.8,
             any.array ==0.5.5.0,
             any.asn1-encoding ==0.9.6,
             any.asn1-parse ==0.9.5,
             any.asn1-types ==0.3.4,
             any.assoc ==1.1,
             assoc +tagged,
             any.async ==2.2.4,
             async -bench,
             any.attoparsec ==0.14.4,
             attoparsec -developer,
             any.attoparsec-aeson ==2.1.0.0,
             any.attoparsec-iso8601 ==1.1.0.0,
             any.auto-update ==0.1.6,
             any.base ==4.18.0.0,
             any.base-compat ==0.13.0,
             any.base-compat-batteries ==0.13.0,
             any.base-orphans ==0.9.0,
             any.base-unicode-symbols ==0.2.4.2,
             base-unicode-symbols +base-4-8 -old-base,
             any.base16-bytestring ==1.0.2.0,
             any.base64 ==0.4.2.4,
             any.base64-bytestring ==1.2.1.0,
             any.basement ==0.0.16,
             any.bifunctors ==5.6.1,
             bifunctors +tagged,
             any.binary ==0.8.9.1,
             any.bitvec ==1.1.4.0,
             bitvec -libgmp,
             any.blaze-builder ==0.4.2.2,
             any.blaze-html ==0.9.1.2,
             any.blaze-markup ==0.8.2.8,
             any.boring ==0.2.1,
             boring +tagged,
             any.bsb-http-chunked ==0.0.0.4,
             any.byteorder ==1.0.4,
             any.bytestring ==0.11.4.0,
             any.cabal-doctest ==1.0.9,
             any.call-stack ==0.4.0,
             any.case-insensitive ==1.2.1.0,
             any.cassava ==0.5.3.0,
             cassava -bytestring--lt-0_10_4,
             any.cereal ==0.5.8.3,
             cereal -bytestring-builder,
             any.citeproc ==0.8.1,
             citeproc -executable -icu,
             any.cmdargs ==0.10.22,
             cmdargs +quotation -testprog,
             any.colour ==2.3.6,
             any.commonmark ==0.2.3,
             any.commonmark-extensions ==0.2.3.4,
             any.commonmark-pandoc ==0.2.1.3,
             any.comonad ==5.0.8,
             comonad +containers +distributive +indexed-traversable,
             any.conduit ==1.3.5,
             any.conduit-extra ==1.3.6,
             any.constraints ==0.13.4,
             any.containers ==0.6.7,
             any.contravariant ==1.5.5,
             contravariant +semigroups +statevar +tagged,
             any.cookie ==0.4.6,
             any.crypton ==0.33,
             crypton -check_alignment +integer-gmp -old_toolchain_inliner +support_aesni +support_deepseq +support_pclmuldq +support_rdrand -support_sse +use_target_attributes,
             any.crypton-connection ==0.3.1,
             any.crypton-x509 ==1.7.6,
             any.crypton-x509-store ==1.6.9,
             any.crypton-x509-system ==1.6.7,
             any.crypton-x509-validation ==1.6.12,
             any.cryptonite ==0.30,
             cryptonite -check_alignment +integer-gmp -old_toolchain_inliner +support_aesni +support_deepseq -support_pclmuldq +support_rdrand -support_sse +use_target_attributes,
             any.data-default ==0.7.1.1,
             any.data-default-class ==0.1.2.0,
             any.data-default-instances-containers ==0.0.1,
             any.data-default-instances-dlist ==0.0.1,
             any.data-default-instances-old-locale ==0.0.1,
             any.data-fix ==0.3.2,
             any.dec ==0.0.5,
             any.deepseq ==1.4.8.1,
             any.digest ==0.0.1.3,
             digest -bytestring-in-base,
             any.digits ==0.3.1,
             any.directory ==1.3.8.1,
             any.distributive ==0.6.2.1,
             distributive +semigroups +tagged,
             any.dlist ==1.0,
             dlist -werror,
             any.doclayout ==0.4.0.1,
             any.doctemplates ==0.11,
             any.easy-file ==0.2.5,
             any.emojis ==0.1.2,
             any.exceptions ==0.10.7,
             any.fast-logger ==3.2.2,
             any.file-embed ==0.0.15.0,
             any.filepath ==1.4.100.1,
             any.generically ==0.1.1,
             any.ghc-bignum ==1.3,
             any.ghc-boot-th ==9.6.2,
             any.ghc-prim ==0.10.0,
             any.gridtables ==0.1.0.0,
             any.haddock-library ==1.11.0,
             any.happy ==1.20.1.1,
             any.hashable ==1.4.2.0,
             hashable +integer-gmp -random-initial-seed,
             any.haskell-lexer ==1.1.1,
             any.hourglass ==0.2.12,
             any.hsc2hs ==0.68.9,
             hsc2hs -in-ghc-tree,
             any.hslua ==2.3.0,
             any.hslua-aeson ==2.3.0.1,
             any.hslua-classes ==2.3.0,
             any.hslua-cli ==1.4.1,
             hslua-cli -executable,
             any.hslua-core ==2.3.1,
             any.hslua-list ==1.1.1,
             any.hslua-marshalling ==2.3.0,
             any.hslua-module-doclayout ==1.1.0,
             any.hslua-module-path ==1.1.0,
             any.hslua-module-system ==1.1.0.1,
             any.hslua-module-text ==1.1.0.1,
             any.hslua-module-version ==1.1.0,
             any.hslua-module-zip ==1.1.0,
             any.hslua-objectorientation ==2.3.0,
             any.hslua-packaging ==2.3.0,
             any.hslua-repl ==0.1.1,
             hslua-repl -executable,
             any.hslua-typing ==0.1.0,
             any.http-api-data ==0.5.1,
             http-api-data -use-text-show,
             any.http-client ==0.7.13.1,
             http-client +network-uri,
             any.http-client-tls ==0.3.6.2,
             any.http-date ==0.0.11,
             any.http-media ==0.8.0.0,
             any.http-types ==0.12.3,
             any.http2 ==4.1.4,
             http2 -devel -h2spec,
             any.indexed-traversable ==0.1.2.1,
             any.indexed-traversable-instances ==0.1.1.2,
             any.integer-gmp ==1.1,
             any.integer-logarithms ==1.0.3.1,
             integer-logarithms -check-bounds +integer-gmp,
             any.iproute ==1.7.12,
             any.ipynb ==0.2,
             any.isocline ==1.0.9,
             any.jira-wiki-markup ==1.5.1,
             any.libyaml ==0.1.2,
             libyaml -no-unicode -system-libyaml,
             any.lpeg ==1.0.4,
             lpeg -rely-on-shared-lpeg-library,
             any.lua ==2.3.1,
             lua +allow-unsafe-gc -apicheck -cross-compile +export-dynamic -lua_32bits -pkg-config -system-lua,
             any.lua-arbitrary ==1.0.1.1,
             any.memory ==0.18.0,
             memory +support_bytestring +support_deepseq,
             any.mime-types ==0.1.1.0,
             any.mmorph ==1.2.0,
             any.monad-control ==1.0.3.1,
             any.mono-traversable ==1.0.15.3,
             any.mtl ==2.3.1,
             any.network ==3.1.4.0,
             network -devel,
             any.network-byte-order ==0.1.6,
             any.network-uri ==2.6.4.2,
             any.old-locale ==1.0.0.7,
             any.old-time ==1.1.0.3,
             any.optparse-applicative ==0.18.1.0,
             optparse-applicative +process,
             any.ordered-containers ==0.2.3,
             pandoc +embed_data_files,
             pandoc-cli +lua -nightly +server,
             any.pandoc-lua-marshal ==0.2.2,
             any.pandoc-types ==1.23.0.1,
             any.parsec ==3.1.16.1,
             any.pem ==0.2.4,
             any.pretty ==1.1.3.6,
             any.pretty-show ==1.10,
             any.prettyprinter ==1.7.1,
             prettyprinter -buildreadme +text,
             any.prettyprinter-ansi-terminal ==1.1.3,
             any.primitive ==0.8.0.0,
             any.process ==1.6.17.0,
             any.psqueues ==0.2.7.3,
             any.random ==1.2.1.1,
             any.recv ==0.1.0,
             any.regex-base ==0.94.0.2,
             any.regex-tdfa ==1.3.2.1,
             regex-tdfa -force-o2,
             any.resourcet ==1.3.0,
             any.rts ==1.0.2,
             any.safe ==0.3.19,
             any.safe-exceptions ==0.1.7.4,
             any.scientific ==0.3.7.0,
             scientific -bytestring-builder -integer-simple,
             any.semialign ==1.3,
             semialign +semigroupoids,
             any.semigroupoids ==6.0.0.1,
             semigroupoids +comonad +containers +contravariant +distributive +tagged +unordered-containers,
             any.servant ==0.20,
             any.servant-server ==0.20,
             any.simple-sendfile ==0.2.32,
             simple-sendfile +allow-bsd -fallback,
             any.singleton-bool ==0.1.7,
             any.skylighting ==0.13.4,
             skylighting -executable,
             any.skylighting-core ==0.13.4,
             skylighting-core -executable,
             any.skylighting-format-ansi ==0.1,
             any.skylighting-format-blaze-html ==0.1.1,
             any.skylighting-format-context ==0.1.0.2,
             any.skylighting-format-latex ==0.1,
             any.socks ==0.6.1,
             any.some ==1.0.5,
             some +newtype-unsafe,
             any.sop-core ==0.5.0.2,
             any.split ==0.2.3.5,
             any.splitmix ==0.1.0.4,
             splitmix -optimised-mixer,
             any.stm ==2.5.1.0,
             any.streaming-commons ==0.2.2.6,
             streaming-commons -use-bytestring-builder,
             any.strict ==0.5,
             any.string-conversions ==0.4.0.1,
             any.syb ==0.7.2.3,
             any.tagged ==0.8.7,
             tagged +deepseq +transformers,
             any.tagsoup ==0.14.8,
             any.tasty ==1.4.3,
             tasty +unix,
             any.tasty-bench ==0.3.4,
             tasty-bench -debug +tasty,
             any.tasty-golden ==2.3.5,
             tasty-golden -build-example,
             any.tasty-hunit ==0.10.0.3,
             any.tasty-lua ==1.1.0,
             any.tasty-quickcheck ==0.10.2,
             any.template-haskell ==2.20.0.0,
             any.temporary ==1.3,
             any.texmath ==0.12.8,
             texmath -executable -server,
             any.text ==2.0.2,
             any.text-conversions ==0.3.1.1,
             any.text-short ==0.1.5,
             text-short -asserts,
             any.th-abstraction ==0.5.0.0,
             any.th-compat ==0.1.4,
             any.th-lift ==0.8.3,
             any.th-lift-instances ==0.1.20,
             any.these ==1.2,
             any.time ==1.12.2,
             any.time-compat ==1.9.6.1,
             time-compat -old-locale,
             any.time-manager ==0.0.0,
             any.tls ==1.7.0,
             tls +compat -hans +network,
             any.toml-parser ==1.2.0.0,
             any.transformers ==0.6.1.0,
             any.transformers-base ==0.4.6,
             transformers-base +orphaninstances,
             any.transformers-compat ==0.7.2,
             transformers-compat -five +five-three -four +generic-deriving +mtl -three -two,
             any.type-equality ==1,
             any.typed-process ==0.2.11.0,
             any.typst ==0.3.0.0,
             typst -executable,
             any.typst-symbols ==0.1.2,
             any.unicode-collation ==0.1.3.4,
             unicode-collation -doctests -executable,
             any.unicode-data ==0.4.0.1,
             unicode-data -ucd2haskell,
             any.unicode-transforms ==0.4.0.1,
             unicode-transforms -bench-show -dev -has-icu -has-llvm -use-gauge,
             any.uniplate ==1.6.13,
             any.unix ==2.8.1.0,
             any.unix-compat ==0.7,
             unix-compat -old-time,
             any.unix-time ==0.4.10,
             any.unliftio ==0.2.25.0,
             any.unliftio-core ==0.2.1.0,
             any.unordered-containers ==0.2.19.1,
             unordered-containers -debug,
             any.utf8-string ==1.0.2,
             any.uuid-types ==1.0.5,
             any.vault ==0.3.1.5,
             vault +useghc,
             any.vector ==0.13.0.0,
             vector +boundschecks -internalchecks -unsafechecks -wall,
             any.vector-algorithms ==0.9.0.1,
             vector-algorithms +bench +boundschecks -internalchecks -llvm +properties -unsafechecks,
             any.vector-stream ==0.1.0.0,
             any.wai ==3.2.3,
             any.wai-app-static ==3.1.7.4,
             wai-app-static +cryptonite -print,
             any.wai-cors ==0.2.7,
             any.wai-extra ==3.1.13.0,
             wai-extra -build-example,
             any.wai-logger ==2.4.0,
             any.warp ==3.3.28,
             warp +allow-sendfilefd -network-bytestring -warp-debug +x509,
             any.witherable ==0.4.2,
             any.word8 ==0.1.3,
             any.xml ==1.3.14,
             any.xml-conduit ==1.9.1.3,
             any.xml-types ==0.3.8,
             any.yaml ==0.11.11.2,
             yaml +no-examples +no-exe,
             any.zip-archive ==0.4.3,
             zip-archive -executable,
             any.zlib ==0.6.3.0,
AndreasPK commented 12 months ago

Seems you depend on text >= 2.0 which comes with the new simd code.

One "easy" way to check if it's text should be to disabled simd for text in a build using the simdutf cabal flag and see if the error still persis.

jgm commented 12 months ago

OK, I think I've built a version using the release build script with a constraint that forces text to use -simdutf. @freijon could you try downloading the build artifact from here and see if you still get the error on your system? https://cirrus-ci.com/task/4511237447352320

freijon commented 12 months ago

I gave it a try, but unfortunately I still get the same error. I also tried --version and noticed that pandoc outputs some text and then fails:

/tmp/pandoc/pandoc-3.1.5/bin/pandoc --version --verbose

pandoc 3.1.5 Features: +server +lua [1] 3102 illegal hardware instruction /tmp/pandoc/pandoc-3.1.5/bin/pandoc --version --verbose

Thank you for your patience and your efforts so far, I appreciate it!

jgm commented 12 months ago

OK, that is helpful information. It suggests that the culprit is not +simdutf in text. @AndreasPK any other ideas?

jgm commented 11 months ago

Actually I think this is a good clue, that --version emits those lines then stops.

versionInfo :: IO ()        
versionInfo = do           
  progname <- getProgName
  defaultDatadir <- defaultUserDataDir
  scriptingEngine <- getEngine            
  putStr $ unlines                          
   [ progname ++ " " ++ showVersion pandocVersion ++ versionSuffix
   , flagSettings                
   , "Scripting engine: " ++ T.unpack (engineName scriptingEngine)
   , "User data directory: " ++ defaultDatadir
   , copyrightMessage
   ]                                      
  exitSuccess

That suggests that the error occurs in the "Scripting engine" part (so, getEngine). That may implicate the Lua subsystem, which obviously has pieces in C. Maybe the C is being compiled with these optimizations; we just need to figure out how to turn that off.

To test this hypothesis I'll try making a build without lua support, which you can try.

jgm commented 11 months ago

OK, the following build disables both the server and the lua flags (as well as simdutf for text): https://cirrus-ci.com/task/4556227045228544

@freijon It will be interesting to see if the problem can be reproduced with this binary.

freijon commented 11 months ago

Thanks!

pandoc --version now works! I see the complete version info. Some progress! Unfortunately, converting a .md to .html still fails with a SIGILL

jgm commented 11 months ago

Does your .md have YAML metadata? I ask because the yaml library embeds a C library. Do you still get the problem when converting a minimal md file (one word)?

freijon commented 11 months ago

My test-.md indeed had some special things like bullet list and headings. I did another test with only one word inside. Still get a SIGILL

jgm commented 11 months ago

Some notes:

We switched to ghc-musl 9.6.2 on June 26 (3.1.5 was built with this). And to ghc-musl 9.4.5 on April 20 (3.1.3 and 3.1.4 were built with this).

I'm pinging @benz0li who maintains the ghc-musl images and might know something else that could be relevant to this issue.

We switched to the crypton ecosystem for the 3.1.4 build (but this doesn't affect 3.1.3).

jgm commented 11 months ago

I'll note that both this and the related Windows issue point to ghc 9.4 as a possible culprit:

I guess there is an easy way to test this hypothesis. I can do a linux build using ghc 9.2, but otherwise the same as the last release.

jgm commented 11 months ago

Update: actually, it looks like ghc-musl-9.4.4 was used for release pandoc 3.1.2, and we switched to 9.4.5 for 3.1.3.

benz0li commented 11 months ago

Actually there is a flag for avx (from ghc 9.6 manual):

-m avx (x86 only) These SIMD instructions are currently not supported by the native code generator. Enabling this flag has no effect and is only present for future extensions.

The LLVM backend may use AVX if your processor supports it, but detects this automatically, so no flag is required.

My understanding is that ghc uses the native code generator by default.

ℹ️ glcr.b-data.ch/ghc/ghc-musl uses the LLVM backend.

benz0li commented 11 months ago

I haven't seen this reported before: is that because only fairly old machines don't support AVX at this point?

Yes. Advanced Vector Extensions (AVX) were introduced 12 years ago.

https://en.wikipedia.org/wiki/Advanced_Vector_Extensions

jgm commented 11 months ago

ghc 9.4.5 bumps text to 2.0.2 in core libraries.

jgm commented 11 months ago

ℹ️ glcr.b-data.ch/ghc/ghc-musl uses the LLVM backend.

Aha! That is something I didn't know. OK, so is there a way to prevent the llvm backend from using avx? And what is the reason for using the llvm backend in ghc-musl?

jgm commented 11 months ago

For testing purposes, here is a build of 3.1.5 that uses ghc-musl-9.4.4: https://cirrus-ci.com/task/5225158336577536

benz0li commented 11 months ago

OK, so is there a way to prevent the llvm backend from using avx?

I don't know.

And what is the reason for using the llvm backend in ghc-musl?

I try to build GHC (almost) the same way as the official Alpine Linux package.
ℹ️ https://gitlab.haskell.org/ghc/ghc/-/issues/23482#note_503004

freijon commented 11 months ago

For testing purposes, here is a build of 3.1.5 that uses ghc-musl-9.4.4: https://cirrus-ci.com/task/5225158336577536

I tested this new binary and it behaves like the "normal" binary:

jgm commented 11 months ago

@benz0li

ℹ️ glcr.b-data.ch/ghc/ghc-musl uses the LLVM backend.

Do you mean that this version of ghc was compiled using the llvm backend? (That shouldn't affect its behavior when run, should it?) Or that, when this ghc is used, it defaults to using the llvm backend rather than the native code generator?

jgm commented 11 months ago

We switched from ghc 9.2 to 9.4 before the 3.0 release. It would tell us something, then, if 3.x versions had this problem but 2.x versions did not.

We started building the linux binaries on cirrus (instead of GH actions) before the 3.1.2 release. So if there were a difference between 3.1.1 and 3.1.2, that would also tell us something.

benz0li commented 11 months ago

Do you mean that this version of ghc was compiled using the llvm backend?

Yes.

(That shouldn't affect its behavior when run, should it?) Or that, when this ghc is used, it defaults to using the llvm backend rather than the native code generator?

(No.) No, it is enabled via the -fllvm flag.
ℹ️ That is my understanding from reading the manual and the Opinion piece on GHC backends..

@AndreasPK Please confirm.

freijon commented 11 months ago

if there were a difference between 3.1.1 and 3.1.2, that would also tell us something.

No difference that I can notice between 3.1.1 and 3.1.2

if 3.x versions had this problem but 2.x versions did not.

Success! With the 2.19.2 binary everything works, even complex translations to PDF and LaTeX preamble!

jgm commented 11 months ago

I'm going to try to make a new version of 3.1.5 that uses ghc 9.2 and let's see if that changes anything.

benz0li commented 11 months ago

I'm going to try to make a new version of 3.1.5 that uses ghc 9.2 and let's see if that changes anything.

@jgm I you are using tag 9.2 this will use GHC version 9.2.8 (source released: 2023-05-26; image built: 2023-05-27).
ℹ️ Pandoc v2.19.2 was built using image glcr.b-data.ch/ghc/ghc-musl:9.2.3 (source released: 2022-05-27, image re-built: 2022-07-29).

jgm commented 11 months ago

This one is built with ghc 9.2.5 (before I saw your message): https://cirrus-ci.com/task/4563975971536896

Note: in addition to using a different ghc version, this uses a different text version. The version of text that comes bundled with ghc is < 2 in ghc 9.2 and > 2 in ghc 9.4. So, if this version does not cause the problem, that could point to either something in ghc 9.4.4 or something in text 2. If this build is a success, I can try another build with ghc 9.2 and text > 2.

jgm commented 11 months ago

For completeness, here's a version built with ghc 9.2.5 and text 2.0.2: https://cirrus-ci.com/task/5450494399741952

freijon commented 11 months ago

I can confirm that the issue seems to be resolved with both binaries

jgm commented 11 months ago

@AndreasPK this might be of interest. We only get the problem when compiling with ghc >= 9.4. With 9.2, it goes away. Same compiled code, using mostly the same dependent library versions. Here are the differences I noted:

--- libs-925    2023-07-17 22:19:24.000000000 -0700
+++ libs-944    2023-07-17 22:19:14.000000000 -0700
@@ -60,7 +60,6 @@
- - data-array-byte-0.1.0.1 (lib) (requires download & build)
- - digest-0.0.1.7 (lib) (requires download & build)
+ - digest-0.0.1.3 (lib) (requires download & build)
- - toml-parser-1.3.0.0 (lib) (requires download & build)
+ - toml-parser-1.2.1.0 (lib) (requires download & build)

Neither the toml-parser nor digest would be used in the basic commands that cause the error.

benz0li commented 11 months ago

I can confirm that the issue seems to be resolved with both binaries

@jgm Thus, to maintain compatibility with older machines, pandoc should be built with glcr.b-data.ch/ghc/ghc-musl:9.2.x.

jgm commented 11 months ago

Yes, I can make that change for now, til we diagnose this properly.

benz0li commented 11 months ago

We only get the problem when compiling with ghc >= 9.4. With 9.2, it goes away.

I migrated from the make-based to the Hadrian build system with GHC v9.4.1. ℹ️ I am still building GHC v9.2.x with the make-based build system, though.

@AndreasPK Do I somehow misconfigure the Hadrian build?

Cross reference: https://www.haskell.org/ghc/blog/20220805-make-to-hadrian.html

(I don't think this is causing the issue. Only mentioning it for the sake of completeness.)

jgm commented 11 months ago

@AndreasPK another data point: There was a similar issue on Windows (#8955). I switched from ghc 9.4 to ghc 9.2 and the problem went away. The build with 9.4 was using the binary downloaded by stack, and the build with 9.2 was using the binary from ghcup. So I don't think this has anything to do with the specific way in which ghc-musl was built.

benz0li commented 11 months ago

@freijon Does pandoc 3.1.6 work as expected on your old machine?

@AndreasPK Thank you for further insights from your side on

AndreasPK commented 11 months ago

Could anyone who can reproduce this try to run pandoc under gdb to get a backtrace?

Alternatively if someone can give me step-by-step instructions which allow to reproduce this I might be able to do so myself depending on the requirements.

AndreasPK commented 11 months ago

I downloaded the release in question and I can see the instruction in it (although my machine does support it). However it seems the release is naturally stripped of all symbols so that wasn't as informative as I had hoped.

AndreasPK commented 11 months ago

I built pandoc myself and just grepped for the instruction in the assembly. This seems to come from the function _hs_bytestring_long_long_uint_hex which is part of bytestring.

It function has been there for "forever" and doesn't explicitly use simd. Rather it seems auto vectorization triggers:

// unsigned long ints (64 bit words)
char* _hs_bytestring_long_long_uint_hex (long long unsigned int x, char* buf) {
    // write hex representation in reverse order
    char c, *ptr = buf, *next_free;
    do {
        *ptr++ = digits[x & 0xf];
        x >>= 4;
    } while ( x );
    // invert written digits
    next_free = ptr--;
    while(buf < ptr) {
        c      = *ptr;
        *ptr-- = *buf;
        *buf++ = c;
    }
    return next_free;
};

So it comes down to whatever flags the version of bytestring pandoc is linked against has been built with.

AndreasPK commented 11 months ago

https://gitlab.haskell.org/ghc/ghc/-/issues/23718

~I can confirm it's an upstream issue. The libraries shipping with ghc seem to have avx enabled.~

Edit: At the very least there are avx instructions in the binary which, on my mache, get executed. However I also have an avx cpu and there seem to be runtime checks. So that's not necessarily wrong.