lvjr / tabularray

Typeset tabulars and arrays with LaTeX3
https://ctan.org/pkg/tabularray
258 stars 22 forks source link

Avoid using l3regex to make huge tables work #305

Open lvjr opened 2 years ago

lvjr commented 2 years ago

Discussed in https://github.com/lvjr/tabularray/discussions/149

Originally posted by **pddzaic** November 8, 2021 Before writing an bug issue i would like to understand, what the error message means. I have a huge table: 10 columns, 1000 rows. The compiling process (lualatex) stops with the following error message: ``` [! Bad register code (65536). \__regex_query_set_aux:nN ...__regex_curr_pos_int {#1}\__kernel_intarray_gse... l.842 \end {longtblr} ``` This happens after 500 rows. Here the minimal working example: ``` \documentclass{standalone} \usepackage{tabularray} \newcommand\id[1]{\lstinline{#1}} \begin{document} \input{test.tex} \end{document} ``` I have attached the `test.tex`. [test.zip](https://github.com/lvjr/tabularray/files/7497026/test.zip)
lvjr commented 2 years ago

From a comment of Phelype Oleinik

@pddzaic There's not really a bug to fix. l3regex uses toks registers to do its job, but there are only 65535 of those available in LuaTeX (half that amount in other engines) so you are facing a limitation of the engine.

we know that the LaTeX team doesn't consider it a bug of l3regex, so we need to rewrite some code in tabularray to make huge tables work.

lvjr commented 2 years ago

From the code we see

\cs_new_protected:Npn \__tblr_split_table_to_lines:NN #1 #2
  {
    \__tblr_insert_braces:N #1
    \seq_set_split:NnV \l_tmpa_seq { \\ } #1
    ...
  }

I assume \seq_set_split doesn't depend on l3regex. So the real problem is \__tblr_insert_braces:

\regex_const:Nn \c__tblr_insert_braces_regex
  {
    \c{begin} \cB\{ (\c[^BE].*) \cE\} (.*?) \c{end} \cB\{ (\c[^BE].*) \cE\}
  }
\tl_const:Nn \c__tblr_insert_braces_tl
  {
    \c{begin} \cB\{ \cB\{ \1 \cE\} \2 \c{end} \cE\} \cB\{ \3 \cE\}
  }
\cs_new_protected:Npn \__tblr_insert_braces:N #1
  {
    \regex_replace_all:NVN \c__tblr_insert_braces_regex \c__tblr_insert_braces_tl #1
    \regex_replace_all:NVN \c__tblr_insert_braces_regex \c__tblr_insert_braces_tl #1
  }

The code dates back to the first commit of tabularray.sty, because we need to protect nesting subtables before splitting a table into rows by \\ .

user202729 commented 1 year ago

(this just popped up in a recent TeX.SE question https://tex.stackexchange.com/questions/686028/how-to-solve-longtblr-error-bad-register-code-32768 )

Because l3regex is powerful it's also slow (which is extra worse because of TeX), but for this task -- assume you only need to replacing \begin/\end at top level with \begin{ and \end} correspondingly I think the easiest way (and also takes linear time) is to

lvjr commented 1 year ago

I am considering to solve this problem and \\ [abc] problem (see https://github.com/lvjr/tabularray/discussions/321) at the same time.