grantjenks / py-tree-sitter-languages

Binary Python wheels for all tree sitter languages.
Other
169 stars 48 forks source link

scanner.c: multiple definition of is_newline - trying to compile all tree-sitter grammars #55

Open milahu opened 9 months ago

milahu commented 9 months ago

i just tried to use this to compile all tree-sitter grammars

but the build fails with many multiple definition errors so it seems like a bad idea to compile all grammars into one binary

tree-sitter-languages.nix ```nix { lib , python3 , fetchFromGitHub , tree-sitter-grammars }: /* # debug: build faster let old-tree-sitter-grammars = tree-sitter-grammars; in let tree-sitter-grammars = { tree-sitter-html = old-tree-sitter-grammars.tree-sitter-html; }; in */ python3.pkgs.buildPythonPackage rec { pname = "tree-sitter-languages"; version = "1.10.2"; pyproject = true; src = fetchFromGitHub { owner = "grantjenks"; repo = "py-tree-sitter-languages"; rev = "v${version}"; hash = "sha256-AuPK15xtLiQx6N2OATVJFecsL8k3pOagrWu1GascbwM="; }; buildInputs = [ python3.pkgs.cython ]; nativeBuildInputs = [ python3.pkgs.setuptools python3.pkgs.wheel ]; propagatedBuildInputs = [ python3.pkgs.tree-sitter ]; postUnpack = '' cd $sourceRoot mkdir vendor ${ builtins.concatStringsSep "" ( builtins.attrValues ( builtins.mapAttrs (n: p: "ln -v -s ${p.src.outPath} vendor/${n}\n" ) (lib.filterAttrs (k: v: v ? src) tree-sitter-grammars) ) ) } cd .. ''; postBuild = '' echo creating $out/${python3.sitePackages}/tree_sitter_languages/languages.so repo_paths=( ${ builtins.concatStringsSep "" ( builtins.attrValues ( builtins.mapAttrs (n: p: " 'vendor/${n}'\n" ) (lib.filterAttrs (k: v: v ? src) tree-sitter-grammars) ) ) } ) # get actual repo paths # fix: No such file or directory: 'vendor/tree-sitter-markdown/src/parser.c for idx in ''${!repo_paths[@]}; do dir=''${repo_paths[$idx]} [ -e $dir/src/parser.c ] && continue parser=$(find $dir -path '*/src/parser.c') dir=''${parser%/src/parser.c} repo_paths[$idx]=$dir done #mkdir -p $out/${python3.sitePackages}/tree_sitter_languages build_py=$( echo "import tree_sitter" echo "repo_paths = [" for dir in ''${repo_paths[@]}; do echo " '$dir'," done echo "]" echo "output_path = '$out/${python3.sitePackages}/tree_sitter_languages/languages.so'" echo "tree_sitter.Language.build_library(output_path, repo_paths)" ) echo "$build_py" | grep -n "" # debug python3 -c "$build_py" ''; pythonImportsCheck = [ "tree_sitter_languages" ]; meta = with lib; { description = "Python module with all tree-sitter languages"; homepage = "https://github.com/grantjenks/py-tree-sitter-languages"; license = licenses.asl20; maintainers = with maintainers; [ ]; }; } ```
build.py ```py import tree_sitter repo_paths = [ 'vendor/tree-sitter-bash', 'vendor/tree-sitter-beancount', 'vendor/tree-sitter-bibtex', 'vendor/tree-sitter-bitbake', 'vendor/tree-sitter-c', 'vendor/tree-sitter-c-sharp', 'vendor/tree-sitter-clojure', 'vendor/tree-sitter-cmake', 'vendor/tree-sitter-comment', 'vendor/tree-sitter-commonlisp', 'vendor/tree-sitter-cpp', 'vendor/tree-sitter-css', 'vendor/tree-sitter-cuda', 'vendor/tree-sitter-cue', 'vendor/tree-sitter-dart', 'vendor/tree-sitter-devicetree', 'vendor/tree-sitter-dockerfile', 'vendor/tree-sitter-dot', 'vendor/tree-sitter-eex', 'vendor/tree-sitter-elisp', 'vendor/tree-sitter-elixir', 'vendor/tree-sitter-elm', 'vendor/tree-sitter-embedded-template', 'vendor/tree-sitter-erlang', 'vendor/tree-sitter-fennel', 'vendor/tree-sitter-fish', 'vendor/tree-sitter-fortran', 'vendor/tree-sitter-gdscript', 'vendor/tree-sitter-glimmer', 'vendor/tree-sitter-glsl', 'vendor/tree-sitter-go', 'vendor/tree-sitter-godot-resource', 'vendor/tree-sitter-gomod', 'vendor/tree-sitter-gowork', 'vendor/tree-sitter-graphql', 'vendor/tree-sitter-haskell', 'vendor/tree-sitter-hcl', 'vendor/tree-sitter-heex', 'vendor/tree-sitter-hjson', 'vendor/tree-sitter-html', 'vendor/tree-sitter-http', 'vendor/tree-sitter-janet-simple', 'vendor/tree-sitter-java', 'vendor/tree-sitter-javascript', 'vendor/tree-sitter-jsdoc', 'vendor/tree-sitter-json', 'vendor/tree-sitter-json5', 'vendor/tree-sitter-jsonnet', 'vendor/tree-sitter-julia', 'vendor/tree-sitter-just', 'vendor/tree-sitter-kotlin', 'vendor/tree-sitter-latex', 'vendor/tree-sitter-ledger', 'vendor/tree-sitter-llvm', 'vendor/tree-sitter-lua', 'vendor/tree-sitter-make', 'vendor/tree-sitter-nickel', 'vendor/tree-sitter-nix', 'vendor/tree-sitter-norg', 'vendor/tree-sitter-norg-meta', 'vendor/tree-sitter-nu', 'vendor/tree-sitter-org-nvim', 'vendor/tree-sitter-perl', 'vendor/tree-sitter-pgn', 'vendor/tree-sitter-php', 'vendor/tree-sitter-pioasm', 'vendor/tree-sitter-prisma', 'vendor/tree-sitter-proto', 'vendor/tree-sitter-pug', 'vendor/tree-sitter-python', 'vendor/tree-sitter-ql', 'vendor/tree-sitter-ql-dbscheme', 'vendor/tree-sitter-query', 'vendor/tree-sitter-r', 'vendor/tree-sitter-regex', 'vendor/tree-sitter-rego', 'vendor/tree-sitter-rst', 'vendor/tree-sitter-ruby', 'vendor/tree-sitter-rust', 'vendor/tree-sitter-scala', 'vendor/tree-sitter-scheme', 'vendor/tree-sitter-scss', 'vendor/tree-sitter-smithy', 'vendor/tree-sitter-solidity', 'vendor/tree-sitter-sparql', 'vendor/tree-sitter-sql', 'vendor/tree-sitter-supercollider', 'vendor/tree-sitter-surface', 'vendor/tree-sitter-svelte', 'vendor/tree-sitter-tiger', 'vendor/tree-sitter-tlaplus', 'vendor/tree-sitter-toml', 'vendor/tree-sitter-tsq', 'vendor/tree-sitter-turtle', 'vendor/tree-sitter-typst', 'vendor/tree-sitter-uiua', 'vendor/tree-sitter-verilog', 'vendor/tree-sitter-vim', 'vendor/tree-sitter-vue', 'vendor/tree-sitter-wgsl', 'vendor/tree-sitter-yaml', 'vendor/tree-sitter-yang', 'vendor/tree-sitter-zig', ] output_path = '/nix/store/87v1hw8y1y82010g66fm3b6qq7v4av3p-python3.11-tree-sitter-languages-1.10.2/lib/python3.11/site-packages/tree_sitter_languages/languages.so' tree_sitter.Language.build_library(output_path, repo_paths) ```
build.log ``` /nix/store/xdqs45iclhp9dz8zz9pvn5zivjbhid1a-binutils-2.40/bin/ld: /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-rst/src/scanner.o: in function `is_newline': scanner.c:(.text+0x190): multiple definition of `is_newline'; /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-comment/src/scanner.o:scanner.c:(.text+0x20): first defined here /nix/store/xdqs45iclhp9dz8zz9pvn5zivjbhid1a-binutils-2.40/bin/ld: /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-rst/src/scanner.o: in function `is_space': scanner.c:(.text+0x1b0): multiple definition of `is_space'; /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-comment/src/scanner.o:scanner.c:(.text+0x40): first defined here /nix/store/xdqs45iclhp9dz8zz9pvn5zivjbhid1a-binutils-2.40/bin/ld: /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-svelte/src/scanner.o: in function `can_contain': scanner.c:(.text+0xbd0): multiple definition of `can_contain'; /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-html/src/scanner.o:scanner.c:(.text+0x1f0): first defined here /nix/store/xdqs45iclhp9dz8zz9pvn5zivjbhid1a-binutils-2.40/bin/ld: /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-svelte/src/scanner.o: in function `serialize': scanner.c:(.text+0x2150): multiple definition of `serialize'; /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-org-nvim/src/scanner.o:scanner.c:(.text+0x0): first defined here /nix/store/xdqs45iclhp9dz8zz9pvn5zivjbhid1a-binutils-2.40/bin/ld: /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-svelte/src/scanner.o: in function `deserialize': scanner.c:(.text+0x2250): multiple definition of `deserialize'; /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-org-nvim/src/scanner.o:scanner.c:(.text+0x150): first defined here /nix/store/xdqs45iclhp9dz8zz9pvn5zivjbhid1a-binutils-2.40/bin/ld: /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-svelte/src/scanner.o: in function `scan': scanner.c:(.text+0x2bc0): multiple definition of `scan'; /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-org-nvim/src/scanner.o:scanner.c:(.text+0x700): first defined here /nix/store/xdqs45iclhp9dz8zz9pvn5zivjbhid1a-binutils-2.40/bin/ld: /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-vue/src/scanner.o: in function `tree_sitter_html_external_scanner_create': scanner.cc:(.text+0xaf0): multiple definition of `tree_sitter_html_external_scanner_create'; /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-html/src/scanner.o:scanner.c:(.text+0x5b0): first defined here /nix/store/xdqs45iclhp9dz8zz9pvn5zivjbhid1a-binutils-2.40/bin/ld: /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-vue/src/scanner.o: in function `tree_sitter_html_external_scanner_serialize': scanner.cc:(.text+0xb20): multiple definition of `tree_sitter_html_external_scanner_serialize'; /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-html/src/scanner.o:scanner.c:(.text+0xac0): first defined here /nix/store/xdqs45iclhp9dz8zz9pvn5zivjbhid1a-binutils-2.40/bin/ld: /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-vue/src/scanner.o: in function `tree_sitter_html_external_scanner_destroy.localalias': scanner.cc:(.text+0xc40): multiple definition of `tree_sitter_html_external_scanner_destroy'; /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-html/src/scanner.o:scanner.c:(.text+0xf00): first defined here /nix/store/xdqs45iclhp9dz8zz9pvn5zivjbhid1a-binutils-2.40/bin/ld: /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-vue/src/scanner.o: in function `tree_sitter_html_external_scanner_deserialize': scanner.cc:(.text+0xe20): multiple definition of `tree_sitter_html_external_scanner_deserialize'; /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-html/src/scanner.o:scanner.c:(.text+0xba0): first defined here /nix/store/xdqs45iclhp9dz8zz9pvn5zivjbhid1a-binutils-2.40/bin/ld: /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-vue/src/scanner.o: in function `tree_sitter_html_external_scanner_scan': scanner.cc:(.text+0x1850): multiple definition of `tree_sitter_html_external_scanner_scan'; /build/tmpxyj0x25atree_sitter_language/vendor/tree-sitter-html/src/scanner.o:scanner.c:(.text+0x5c0): first defined here collect2: error: ld returned 1 exit status ```
milahu commented 9 months ago

out of scope

these are bugs in the parsers

see also https://github.com/tree-sitter/tree-sitter-html/pull/64 https://arduino.stackexchange.com/questions/45419/multiple-definition-of-local-variable-in-linking

milahu commented 9 months ago

these are bugs in the parsers

but still, these errors can be avoided by compiling each parser to a separate binary this also has the benefit that we can reuse parser binaries

see also https://github.com/milahu/nur-packages/commit/85dccc20ffb4b88978058d039dcf6b87bb118bbe