wrpearson / fasta36

Git repository for FASTA36 sequence comparison software
Apache License 2.0
117 stars 16 forks source link

*** stack smashing detected ***: terminated on the ENA PRO division #31

Open olivierfriard opened 3 years ago

olivierfriard commented 3 years ago

Dear Prof. Pearson,

I downloaded the PRO division from the ENA db (file STDPRO*.dat.gz in EMBL format) and gunziped them. and wrote the STD_PRO.dtb file: STD_PROD_1.dat 3 STD_PROD_2.dat 3 STD_PROD_3.dat 3

when I launch the following command on Ubuntu 20.04 32Gb RAM fasta36 orf8.seq @STD_PRO.dtb > orf8_pro.fasta36

I obtained an error message this error message: stack smashing detected : terminated Aborted (core dumped)

The fasta version is 36.3.8h

Same error message for the HUM division, the VRL division is working.

Thank you for the support

Olivier Friard

wrpearson commented 3 years ago

Thanks for the bug report.

It has been a very long time since I have tested against the EMBL format database.

Can you tell me a bit more about the system you are running fasta36 on? The "stack smashing" message is not coming from the fasta code, so I am wondering what operating system/compiler combination is generating it.

Thanks,

Bill Pearson

Begin forwarded message:

From: Olivier Friard @.**@.>>

Subject: [wrpearson/fasta36] stack smashing detected : terminated on the ENA PRO division (#31)

Date: July 1, 2021 at 8:52:16 AM MDT

To: wrpearson/fasta36 @.**@.>>

Cc: Subscribed @.**@.>>

Reply-To: wrpearson/fasta36 @.**@.>>

Dear Prof. Pearson,

I downloaded the PRO division from the ENA db (file STDPRO*.dat.gz in EMBL format) and gunziped them. and wrote the STD_PRO.dtb file: STD_PROD_1.dat 3 STD_PROD_2.dat 3 STD_PROD_3.dat 3

when I launch the following command on Ubuntu 20.04 32Gb RAM fasta36 orf8.seq @STD_PRO.dtb > orf8_pro.fasta36

I obtained an error message this error message: stack smashing detected : terminated Aborted (core dumped)

The fasta version is 36.3.8h

Same error message for the HUM division, the VRL division is working.

Thank you for the support

Olivier Friard

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/wrpearson/fasta36/issues/31, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQYNPZVOZQZMET2RJCVPJ3TVR6KBANCNFSM47U4LTJA.

olivierfriard commented 3 years ago

Hello,

I confirm that the issue is due to the EMBL format. If the EMBL db is converted in FASTA format the problem disappears.

I obtained from a colleague a more detailed error message:

AJ786203; Sulfolobus solfataricus small RNA ( 38) [r] 44 18.2 7.4e+07 AJ786177; Sulfolobus solfataricus non-codin ( 17) [r] 41 17.4 1.3e+08 More scores? [0] Display alignments also? (y/n) [n] y number of alignments [45791]? *** ERROR [compacc2e.c:1015] - cannot allocate space[-883441658] for sequence encoding

Here some info about my system:

uname -a

Linux opti-9010 5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

cat /proc/cpuinfo

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 58
model name  : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping    : 9
microcode   : 0x21
cpu MHz     : 1995.064
cache size  : 8192 KB
physical id : 0
siblings    : 8
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
bugs        : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds
bogomips    : 6784.51
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.3.0-17ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) 

Thank you for your support

Olivier Friard

wrpearson commented 3 years ago

I have just tested a search against:

rel_std_pro_01_r143.dat 3 rel_std_pro_02_r143.dat 3 rel_std_pro_03_r143.dat 3 rel_std_pro_04_r143.dat 3 rel_std_pro_05_r143.dat 3 rel_std_pro_06_r143.dat 3 rel_std_pro_07_r143.dat 3 rel_std_pro_08_r143.dat 3 rel_std_pro_09_r143.dat 3 rel_std_pro_10_r143.dat 3 rel_std_pro_11_r143.dat 3 rel_std_pro_12_r143.dat 3 rel_std_pro_13_r143.dat 3 rel_std_pro_14_r143.dat 3 rel_std_pro_15_r143.dat 3 rel_std_pro_16_r143.dat 3 rel_std_pro_17_r143.dat 3 rel_std_pro_18_r143.dat 3 rel_std_pro_19_r143.dat 3 rel_std_pro_20_r143.dat 3 rel_std_pro_21_r143.dat 3 rel_std_pro_22_r143.dat 3 rel_std_pro_23_r143.dat 3 rel_std_pro_24_r143.dat 3 rel_std_pro_25_r143.dat 3 rel_std_pro_26_r143.dat 3 rel_std_pro_27_r143.dat 3 rel_std_pro_28_r143.dat 3 rel_std_pro_29_r143.dat 3

and it completed without problems.

I'm afraid I need a bit more information -- perhaps the query sequence?

Bill Pearson

olivierfriard commented 3 years ago

Thank you, You have tested against the old database version. The EBI reorganized the EMBL database recently and it is now called ENA. The sequence format is still EMBL but the files are bigger. See ftp://ftp.ebi.ac.uk/pub/databases/ena/ I obtained an error with fasta using all files of the STD_PRO division.

Olivier Friard

On Tuesday, July 6, 2021, William Pearson @.***> wrote:

I have just tested a search against:

rel_std_pro_01_r143.dat 3 rel_std_pro_02_r143.dat 3 rel_std_pro_03_r143.dat 3 rel_std_pro_04_r143.dat 3 rel_std_pro_05_r143.dat 3 rel_std_pro_06_r143.dat 3 rel_std_pro_07_r143.dat 3 rel_std_pro_08_r143.dat 3 rel_std_pro_09_r143.dat 3 rel_std_pro_10_r143.dat 3 rel_std_pro_11_r143.dat 3 rel_std_pro_12_r143.dat 3 rel_std_pro_13_r143.dat 3 rel_std_pro_14_r143.dat 3 rel_std_pro_15_r143.dat 3 rel_std_pro_16_r143.dat 3 rel_std_pro_17_r143.dat 3 rel_std_pro_18_r143.dat 3 rel_std_pro_19_r143.dat 3 rel_std_pro_20_r143.dat 3 rel_std_pro_21_r143.dat 3 rel_std_pro_22_r143.dat 3 rel_std_pro_23_r143.dat 3 rel_std_pro_24_r143.dat 3 rel_std_pro_25_r143.dat 3 rel_std_pro_26_r143.dat 3 rel_std_pro_27_r143.dat 3 rel_std_pro_28_r143.dat 3 rel_std_pro_29_r143.dat 3

and it completed without problems.

I'm afraid I need a bit more information -- perhaps the query sequence?

Bill Pearson

-- University of Torino Dept of Life Sciences and Systems Biology Via dell'Accademia Albertina, 13 10123 TORINO (Italy) tel: +39 011 6704542 http://penelope.unito.it/friard/ http://orcid.org/0000-0002-0374-9872

wrpearson commented 3 years ago

When I go to the EBI ENA FTP site, I am seeing exactly the same file that I used.

Do you have files that are after r143 and dated after March 2020?

Could you give me the complete ftp link to those files.

I got my files from:

ftp.ebi.ac.uk/pub/databases/ena/sequence/std/rel_stdpro*_r143.dat.gz

Which has some very very large files:

40804201143 Mar 8 2020 rel_std_pro_05_r143.dat.gz

But I have not been able to find more recent files on the FTP site.

Bill Pearson

olivierfriard commented 3 years ago

Sorry for the delay, I was out of my office.

The new files for ENA are available at ftp://ftp.ebi.ac.uk/pub/databases/ena/sequence/snapshot_latest/std I tested with the ungzipped STD_PRO_1.dat.gz and obtained an error.

Olivier Friard

wrpearson commented 3 years ago

I have still not been able to produce any errors with that file, either using tfastx36 with seq/mgstm1.aa as a query or fasta36 with seq/mgstm1.nt as a query.

Could you tell me something about your query (DNA/protein), and how long it is? Are you using more than one query?

Even better would be if you could send it.

Thanks,

Bill Pearson

olivierfriard commented 3 years ago

I obtained the same error using the seq/mgstm1.nt sequence as a query

The command is: fasta36 mgstm1.nt "STD_PRO_1.dat 3"

fasta36 version 36.3.8h May, 2020

The size of the STD_PRO_1.dat file is 242 Gb. It contains 1000000 sequences. The longest one is 16040666 nt. MD5SUM: bd1d4eadb5ed5d5204e1278b87d29868

The process uses about 7.8 Gb on 32 available and then crashes.

Thank you

Olivier Friard

On Mon, Jul 12, 2021 at 4:23 PM William Pearson @.***> wrote:

I have still not been able to produce any errors with that file, either using tfastx36 with seq/mgstm1.aa as a query or fasta36 with seq/mgstm1.nt as a query.

Could you tell me something about your query (DNA/protein), and how long it is? Are you using more than one query?

Even better would be if you could send it.

Thanks,

Bill Pearson

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wrpearson/fasta36/issues/31#issuecomment-878323325, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABR2AYRKVKXYLB2SBCO3PGTTXL3H7ANCNFSM47U4LTJA .

-- University of Torino Dept of Life Sciences and Systems Biology Via dell'Accademia Albertina, 13 10123 TORINO (Italy) tel: +39 011 6704542 http://penelope.unito.it/friard/ http://orcid.org/0000-0002-0374-9872

wrpearson commented 3 years ago

Did you compile (from the src/ directory) with: "make ../make/Makefile.linux64_sse2" ? or did you use some other makefile?

If you can give me an account on your machine, I can take a look at it.

Alternatively, you might try running with the -R result.file option, which should list each of the sequences examined and the score during the search, and let me know when the crash occurs.

Until I can reproduce the crash, there is not much I can do.

Bill

Begin forwarded message:

From: Olivier Friard @.**@.>>

Subject: Re: [wrpearson/fasta36] stack smashing detected : terminated on the ENA PRO division (#31)

Date: July 12, 2021 at 9:38:44 AM MDT

To: wrpearson/fasta36 @.**@.>>

Cc: William Pearson @.**@.>>, Comment @.**@.>>

Reply-To: wrpearson/fasta36 @.**@.>>

I obtained the same error using the seq/mgstm1.nt sequence as a query

The command is: fasta36 mgstm1.nt "STD_PRO_1.dat 3"

fasta36 version 36.3.8h May, 2020

The size of the STD_PRO_1.dat file is 242 Gb. It contains 1000000 sequences. The longest one is 16040666 nt. MD5SUM: bd1d4eadb5ed5d5204e1278b87d29868

The process uses about 7.8 Gb on 32 available and then crashes.

Thank you

Olivier Friard

On Mon, Jul 12, 2021 at 4:23 PM William Pearson @.***> wrote:

I have still not been able to produce any errors with that file, either using tfastx36 with seq/mgstm1.aa as a query or fasta36 with seq/mgstm1.nt as a query.

Could you tell me something about your query (DNA/protein), and how long it is? Are you using more than one query?

Even better would be if you could send it.

Thanks,

Bill Pearson

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wrpearson/fasta36/issues/31#issuecomment-878323325, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABR2AYRKVKXYLB2SBCO3PGTTXL3H7ANCNFSM47U4LTJA .

-- University of Torino Dept of Life Sciences and Systems Biology Via dell'Accademia Albertina, 13 10123 TORINO (Italy) tel: +39 011 6704542 http://penelope.unito.it/friard/ http://orcid.org/0000-0002-0374-9872

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/wrpearson/fasta36/issues/31#issuecomment-878383077, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQYNP3SZ6HVRLRXGDMQPP3TXMEAJANCNFSM47U4LTJA.