r-lib / nanoparquet

R package to read and write Parquet files
https://nanoparquet.r-lib.org/
Other
53 stars 0 forks source link

segfault when using `write_parquet()` #72

Closed PMassicotte closed 4 months ago

PMassicotte commented 4 months ago

Launching R with R -d lldb gives me:

library(nanoparquet)
library(nycflights13)

tf <- tempfile(fileext = ".parquet")

write_parquet(flights, tf, compression = "zstd")
Process 47144 stopped
* thread #1, name = 'R', stop reason = signal SIGFPE: integer divide by zero
    frame #0: 0x00007fffea2eda8a nanoparquet.so`nanoparquet::ParquetOutFile::rle_encode(ByteBuffer&, unsigned int, ByteBuffer&, unsigned char, bool, bool, unsigned int) + 266
nanoparquet.so`nanoparquet::ParquetOutFile::rle_encode:
->  0x7fffea2eda8a <+266>: idivl  %r15d
    0x7fffea2eda8d <+269>: movl   %eax, %ecx
    0x7fffea2eda8f <+271>: xorl   %eax, %eax
    0x7fffea2eda91 <+273>: testl  %r13d, %r13d

It crashes whatever compression method I am using.

Session info:

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_gf_lp64.so;  LAPACK version 3.8.0

locale:
 [1] LC_CTYPE=en_CA.UTF-8      
 [2] LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8       
 [4] LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8   
 [6] LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8      
 [8] LC_NAME=C                 
 [9] LC_ADDRESS=C              
[10] LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8
[12] LC_IDENTIFICATION=C       

time zone: America/Toronto
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils    
[5] datasets  methods   base     

other attached packages:
[1] nycflights13_1.0.2 nanoparquet_0.3.0 
[3] nvimcom_0.9.42    

loaded via a namespace (and not attached):
 [1] utf8_1.2.4        httpgd_2.0.2     
 [3] magrittr_2.0.3    glue_1.7.0       
 [5] tibble_3.2.1      pkgconfig_2.0.3  
 [7] lifecycle_1.0.4   cli_3.6.2        
 [9] fansi_1.0.6       unigd_0.1.2      
[11] vctrs_0.6.5       systemfonts_1.1.0
[13] compiler_4.4.1    tools_4.4.1      
[15] pillar_1.9.0      rlang_1.1.4  
gaborcsardi commented 4 months ago

How did you install nanoparquet? If you compiled it, what compilation flags did you use? Eg. can you show the installation output?

PMassicotte commented 4 months ago

From RSPM binary:

'https://packagemanager.posit.co/cran/__linux__/noble/latest/src/contrib/nanoparquet_0.3.0.tar.gz'

PMassicotte commented 4 months ago

let me try to install from source

PMassicotte commented 4 months ago

Same error. Here is the installation ouput from building from source:

[nav] r$> install.packages('nanoparquet', repos = "https://cran.rstudio.com")
Installing package into ‘/home/filoche/R/x86_64-pc-linux-gnu-library/4.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/src/contrib/nanoparquet_0.3.0.tar.gz'
Content type 'application/x-gzip' length 1025569 bytes (1001 KB)
==================================================
downloaded 1001 KB

* installing *source* package ‘nanoparquet’ ...
** package ‘nanoparquet’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
using C++ compiler: ‘g++ (Ubuntu 13.2.0-23ubuntu4) 13.2.0’
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c rwrapper.cpp -o rwrapper.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c protect.cpp -o protect.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c read.cpp -o read.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c write.cpp -o write.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c read-metadata.cpp -o read-metadata.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c read-pages.cpp -o read-pages.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c arrow-schema.cpp -o arrow-schema.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c base64.cpp -o base64.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c r-base64.cpp -o r-base64.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c snappy.cpp -o snappy.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c encodings.cpp -o encodings.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c dictionary-encoding.cpp -o dictionary-encoding.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c test.cpp -o test.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c lib/ParquetFile.cpp -o lib/ParquetFile.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c lib/ParquetOutFile.cpp -o lib/ParquetOutFile.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c lib/RleBpDecoder.cpp -o lib/RleBpDecoder.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c parquet/parquet_types.cpp -o parquet/parquet_types.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c thrift/protocol/TProtocol.cpp -o thrift/protocol/TProtocol.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c thrift/transport/TTransportException.cpp -o thrift/transport/TTransportException.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c thrift/transport/TBufferTransports.cpp -o thrift/transport/TBufferTransports.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c fastpforlib/bitpacking.cpp -o fastpforlib/bitpacking.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c snappy/snappy.cc -o snappy/snappy.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c snappy/snappy-sinksource.cc -o snappy/snappy-sinksource.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c miniz/miniz.cpp -o miniz/miniz.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/common/entropy_common.cpp -o zstd/common/entropy_common.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/common/error_private.cpp -o zstd/common/error_private.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/common/fse_decompress.cpp -o zstd/common/fse_decompress.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/common/xxhash.cpp -o zstd/common/xxhash.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/common/zstd_common.cpp -o zstd/common/zstd_common.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/decompress/huf_decompress.cpp -o zstd/decompress/huf_decompress.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/decompress/zstd_ddict.cpp -o zstd/decompress/zstd_ddict.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/decompress/zstd_decompress.cpp -o zstd/decompress/zstd_decompress.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/decompress/zstd_decompress_block.cpp -o zstd/decompress/zstd_decompress_block.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/compress/fse_compress.cpp -o zstd/compress/fse_compress.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/compress/hist.cpp -o zstd/compress/hist.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -f
pic  -g -O2   -c zstd/compress/huf_compress.cpp -o zstd/compress/huf_compress.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/compress/zstd_compress.cpp -o zstd/compress/zstd_compress.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/compress/zstd_compress_literals.cpp -o zstd/compress/zstd_compress_literals.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/compress/zstd_compress_sequences.cpp -o zstd/compress/zstd_compress_sequences.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/compress/zstd_compress_superblock.cpp -o zstd/compress/zstd_compress_superblock.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/compress/zstd_double_fast.cpp -o zstd/compress/zstd_double_fast.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/compress/zstd_fast.cpp -o zstd/compress/zstd_fast.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/compress/zstd_lazy.cpp -o zstd/compress/zstd_lazy.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/compress/zstd_ldm.cpp -o zstd/compress/zstd_ldm.o
g++ -std=gnu++17 -I"/usr/bin/R/include" -DNDEBUG -Ithrift -I. -Izstd/include  -I/usr/local/include   -DR_NO_REMAP -fpic  -g -O2   -c zstd/compress/zstd_opt.cpp -o zstd/compress/zstd_opt.o
g++ -std=gnu++17 -shared -L/usr/bin/R/lib -L/usr/local/lib -o nanoparquet.so rwrapper.o protect.o read.o write.o read-metadata.o read-pages.o arrow-schema.o base64.o r-base64.o snappy.o encodings.o dictionary-encoding.o test.o lib/ParquetFile.o lib/ParquetOutFile.o lib/RleBpDecoder.o parquet/parquet_types.o thrift/protocol/TProtocol.o thrift/transport/TTransportException.o thrift/transport/TBufferTransports.o fastpforlib/bitpacking.o snappy/snappy.o snappy/snappy-sinksource.o miniz/miniz.o zstd/common/entropy_common.o zstd/common/error_private.o zstd/common/fse_decompress.o zstd/common/xxhash.o zstd/common/zstd_common.o zstd/decompress/huf_decompress.o zstd/decompress/zstd_ddict.o zstd/decompress/zstd_decompress.o zstd/decompress/zstd_decompress_block.o zstd/compress/fse_compress.o zstd/compress/hist.o zstd/compress/huf_compress.o zstd/compress/zstd_compress.o zstd/compress/zstd_compress_literals.o zstd/compress/zstd_compress_sequences.o zstd/compress/zstd_compress_superblock.o zstd/compress/zstd_double_fast.o zstd/compress/zstd_fast.o zstd/compress/zstd_lazy.o zstd/compress/zstd_ldm.o zstd/compress/zstd_opt.o -L/usr/bin/R/lib -lR
installing to /home/filoche/R/x86_64-pc-linux-gnu-library/4.4/00LOCK-nanoparquet/00new/nanoparquet/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (nanoparquet)
gaborcsardi commented 4 months ago

Please install from GH for now, we'll have another release soon. I have no clue why this only happens on Ubuntu Noble...