refresh-bio / FaStore

FaStore - high-performance FASTQ files compressor
GNU General Public License v3.0
15 stars 7 forks source link

frequency Segmentation fault #4

Open yuansliu opened 6 years ago

yuansliu commented 6 years ago

Hi @lrog

I got some Segmentation fault. As I always got segmentation fault on real dataset. I run the script to test it. As following:

./get_fastq.sh 
[ scripts]$ head -9600 test_1.fq > test_1_part.fq
[ scripts]$ sh test_se.sh test_1_part.fq 8
--------------------------------
testing: lossless
--------------------------------
fastore_compress.sh: line 227: 13004 Segmentation fault      $FASTORE_REBIN e "-i$TMP_BIN" "-o$TMP_REBIN-2" "-t$TH_REBIN" $PAR_REBIN_C1 $PAR_PE -p2
[ scripts]$ head -9596 test_1.fq > test_1_part.fq
[ scripts]$ sh test_se.sh test_1_part.fq 8
--------------------------------
testing: lossless
--------------------------------
--------------------------------
testing: reduced
--------------------------------
--------------------------------
testing: lossy
--------------------------------
--------------------------------
testing: max
--------------------------------

The first test has one more reads than that of the second. Some information about the compile

[ scripts]$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-4/root/usr/libexec/gcc/x86_64-redhat-linux/5.3.1/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/opt/rh/devtoolset-4/root/usr --mandir=/opt/rh/devtoolset-4/root/usr/share/man --infodir=/opt/rh/devtoolset-4/root/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-plugin --with-linker-hash-style=gnu --enable-initfini-array --disable-libgcj --with-default-libstdcxx-abi=gcc4-compatible --with-isl=/builddir/build/BUILD/gcc-5.3.1-20160406/obj-x86_64-redhat-linux/isl-install --enable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC) 

And, I also test gcc version 6.2.1 20160916 (Red Hat 6.2.1-3) (GCC). It can not work.

Is there some other special requirements to run your tool?

Thank you very much.


And it works well on paired-end reads.

lrog commented 6 years ago

Hi @yuansliu Thanks for your report. Just to mention in this thread -- it seems that the problem is related with the old, original implementation of PPMd codec we are using. Using modern versions of GCC (>= 5.0) and with optimisations turned on (>= -O2) the codec seems not work properly on some datasets. It is a pretty old legacy code that we have been already thinking about updating (if possible) or replacing.

This issue may be also related with issue #2 .

lrog commented 5 years ago

Sorry @yuansliu for quite delay with the possible fix. I've just merged the fix to master. Apart from potential issues with outdated PPMd, there was a bug in packing the bins. Could you possibly check it whether the fix solves your issue?

yuansliu commented 5 years ago

Hi @lrog ,

I cloned the latest version. I also run the test data obtained by sh get_fastq.sh It seem that my compiled version can not work well on my computer.

[user] git show
commit 1565968608b6da189d8ba2324cb8c946fb1b6f53
Merge: e16dfa9 37226e8
Author: Lucas <lrog@users.noreply.github.com>
Date:   Thu Nov 22 18:47:44 2018 +0000

    Merge pull request #5 from refresh-bio/fix-empty-bins

    Fix for error when trying to pack empty bins
sh fastore_compress.sh --max --in test_1.fq --out COMP --threads 24
fastore_compress.sh: line 227: 117378 Segmentation fault      $FASTORE_REBIN e "-i$TMP_BIN" "-o$TMP_REBIN-2" "-t$TH_REBIN" $PAR_REBIN_C1 $PAR_PE -p2

And it seems that the makefile is not robust on my computer. Some problems occurred when compiling FaStore. I need to modify the file Makefile. It is caused by different version of Make?

When make in the folder 'fastore'

cd fastore/fastore_bin && make fastore_bin
make[1]: Entering directory `/data/yuansliu/FaStore/fastore/fastore_bin'
.
.
.
echo "#include \"version.h\"" > version.cpp
echo "std::string GetCompilationTime() {return std::string(__DATE__) + \" -- \" + std::string(__TIME__);}" >> version.cpp
echo "std::string GetAppVersion() {return \"0.8.0\";}" >> version.cpp
g++ -m64 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -std=c++11 -pthread -DNDEBUG -O2  -static -Wl,--whole-archive -lpthread -Wl,--no-whole-archive -o fastore_bin main.cpp version.cpp BinModule.o BinOperator.o BinFile.o FastqPacker.o FastqCategorizer.o FastqParser.o FastqStream.o FileStream.o Stats.o /data/yuansliu/FaStore/fastore/fastore_pack/codebook.o /data/yuansliu/FaStore/fastore/fastore_pack/distortion.o /data/yuansliu/FaStore/fastore/fastore_pack/pmf.o /data/yuansliu/FaStore/fastore/fastore_pack/quantizer.o /data/yuansliu/FaStore/fastore/fastore_pack/util.o /data/yuansliu/FaStore/fastore/fastore_pack/well.o /data/yuansliu/FaStore/fastore/fastore_bin/QVZ.o -lz
/opt/rh/devtoolset-4/root/usr/libexec/gcc/x86_64-redhat-linux/5.3.1/ld: cannot find -lpthread
/opt/rh/devtoolset-4/root/usr/libexec/gcc/x86_64-redhat-linux/5.3.1/ld: cannot find -lz
/opt/rh/devtoolset-4/root/usr/libexec/gcc/x86_64-redhat-linux/5.3.1/ld: cannot find -lm
/opt/rh/devtoolset-4/root/usr/libexec/gcc/x86_64-redhat-linux/5.3.1/ld: cannot find -lpthread
/opt/rh/devtoolset-4/root/usr/libexec/gcc/x86_64-redhat-linux/5.3.1/ld: cannot find -lc
collect2: error: ld returned 1 exit status

Then, I compile them in their subfolder.

[yuansliu@vulcan3 fastore_bin]$ make
echo "#include \"version.h\"" > version.cpp
echo "std::string GetCompilationTime() {return std::string(__DATE__) + \" -- \" + std::string(__TIME__);}" >> version.cpp
echo "std::string GetAppVersion() {return \"0.8.0\";}" >> version.cpp
g++ -m64 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -std=c++11 -pthread -DNDEBUG -O2 -flto -fwhole-program  -o fastore_bin main.cpp version.cpp BinModule.o BinOperator.o BinFile.o FastqPacker.o FastqCategorizer.o FastqParser.o FastqStream.o FileStream.o Stats.o  -lz
BinFile.o: In function `BinFileWriter::BinFileWriter()':
BinFile.cpp:(.text+0x682): undefined reference to `QvzCodebook::QvzCodebook()'
BinFile.cpp:(.text+0x77d): undefined reference to `QvzCodebook::~QvzCodebook()'
...
...
...
Stats.cpp:(.text+0x4e0): undefined reference to `pmf_increment(pmf_t*, unsigned int)'
collect2: error: ld returned 1 exit status
[user] make --version
GNU Make 3.82
Built for x86_64-redhat-linux-gnu
Copyright (C) 2010  Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.