mychem / mychem-code

Mychem is an extension for MySQL that makes possible to use cheminformatics functions within SQL queries.
GNU General Public License v2.0
21 stars 14 forks source link

long strings #14

Open hmpelta opened 8 years ago

hmpelta commented 8 years ago

Hi, I installed Mychem and it works normally if input string is n`t very long. for example : select MOLECULE_TO_CANONICAL_SMILES(SMILES_TO_MOLECULE("[CH4]")); works fine but if input smiles become: smiles.pdf mychem returns :
ERROR 2013 (HY000): Lost connection to MySQL server during query I checked openbabel with this smiles and it returned true value . Your comments and suggestions are welcome. thanks in advance;

fredrikw commented 8 years ago

Hi,

Is that lång SMILES string copied exactly? Because I don't think it's a valid SMILES. There are a few numbers that are not within the brackets, meaning that they should be ring closures. But as such they need to match and in your example you have a single one and a single two.

Kind regards, Fredrik

Skickat från min iPhone

5 apr. 2016 kl. 15:46 skrev hmpelta notifications@github.com:

Hi, I installed Mychem and it works normally if input string is n`t very long. for example : select MOLECULE_TO_CANONICAL_SMILES(SMILES_TO_MOLECULE("[CH4]")); works fine but if input smiles become: "[CH3]CH[NH]C(=O)[CH]1[CH2][CH2][CH2]CH[NH]C(=O)CH[NH]C(=O)[CH]2[CH2][CH2][CH2]CH[NH]C(=O)CH[NH]C(=O)CH[NH]C(=O)CH[NH]C(=O)CH[NH]C(=O)CHO[CH]3CHCHOCH[CH]3[NH]C(=O)[CH3]" . mychem returns :

ERROR 2013 (HY000): Lost connection to MySQL server during query I checked openbabel with this smiles and it returned true value . Your comments and suggestions are welcome. thanks in advance;

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

hmpelta commented 8 years ago

Thanks for your answer; yes, I am sure its true. this molecule common name is :three disacharide linked murein units (pentapeptide crosslinked tetrapeptide (A2pm->D-ala) tetrapeptide corsslinked tetrapeptide (A2pm->D-ala)) (middle of chain). I canonicalized its smiles with open babel and that returned this smiles. I also checked mychem with another long smiles from pubmed site and the result was similar. the link of this molecule in biocyc:http://biocyc.org/META/new-image?object=CPD0-2296

fredrikw commented 8 years ago

Hi, I'm sorry you are correct, I got lost in the long string... However, trying it myself it works without problems. (After removing the whitespace that was present when copying from the PDF) SELECT MOLECULE_TO_CANONICAL_SMILES(SMILES_TO_MOLECULE("[CH3][CH](C(=O)[NH][CH](C(=O)[OH])[CH3])[NH]C(=O)[CH]1[CH2][CH2][CH2][CH](C(=O)[OH])[NH]C(=O)[CH]([CH3])[NH]C(=O)[CH]2[CH2][CH2][CH2][CH](C(=O)[OH])[NH]C(=O)[CH]([CH3])[NH]C(=O)[CH]([CH2][CH2][CH2][CH](C(=O)[OH])[NH2])[NH]C(=O)[CH]([CH2][CH2]C(=O)[OH])[NH]C(=O)[CH]([CH3])[NH]C(=O)[CH]([CH3])O[CH]3[CH](O[CH]4[CH]([NH]C(=O)[CH3])[CH]([OH])[CH]([OH])[CH]([CH2][OH])O4)[CH]([CH2][OH])O[CH](O[CH]4[CH]([CH2][OH])O[CH](O[CH]5[CH]([CH2][OH])O[CH](O[CH]6[CH]([CH2][OH])O[CH](O[CH]7[CH]([CH2][OH])O[CH]=C([NH]C(=O)[CH3])[CH]7O[CH](C(=O)[NH][CH](C(=O)[NH][CH](C(=O)[NH]1)[CH2][CH2]C(=O)[OH])[CH3])[CH3])[CH]([NH]C(=O)[CH3])[CH]6[OH])[CH]([NH]C(=O)[CH3])[CH]5O[CH](C(=O)[NH][CH](C(=O)[NH][CH](C(=O)[NH]2)[CH2][CH2]C(=O)[OH])[CH3])[CH3])[CH]([NH]C(=O)[CH3])[CH]4[OH])[CH]3[NH]C(=O)[CH3]")); results in OCC1OC2OC3C(CO)OC(C(C3O)NC(=O)C)OC3C(CO)OC=C(C3OC(C)C(=O)NC(C)C(=O)NC(CCC(=O)O)C(=O)NC(CCCC(NC(=O)C(NC(=O)C3NC(=O)C(NC(=O)C(NC(=O)C(OC(C1OC1OC(CO)C(C(C1NC(=O)C)O)OC1OC(CO)C(C(C1NC(=O)C)OC(C)C(=O)NC(C)C(=O)NC(CCC(=O)O)C(=O)NC(CCCC(C(=O)O)N)C(=O)NC(C(=O)NC(CCC3)C(=O)O)C)OC1OC(CO)C(C(C1NC(=O)C)O)O)C2NC(=O)C)C)C)CCC(=O)O)C)C(=O)O)C(=O)NC(C(=O)NC(C(=O)O)C)C)NC(=O)C I'm also running on a Mac, with MariaDB 10.0.15 and an OpenBabel compiled from trunk last november. My suggestion is that you look in the server log for MySQL and see if you can find any hints as to why it is crashing.

hmpelta commented 8 years ago

Hi, Thanks for taking time to answer my question. I am using mac(10.10.5) and mysql 5.7.11 . can I ask you to send your my.ini file ,or please send the out put of {show variables like "%timeout%";} and {show variables like "%max%"}. sincerely;

Pansanel commented 4 years ago

Hi,

The query takes less than 0.05s, it is not a timeout issue. With Mychem, timeout issues appear mostly when the argument passed to the mychem function is incomplete or when there is some issue running the openbabel library. Can you enable the Mychem test suite and run it? Look at the "Testing the installation" section on: http://mychem.sourceforge.net/doc/apas03.html

Jerome

paulo-maia commented 3 years ago

Hi @Pansanel @fredrikw @hmpelta , any update on this? I'm getting exactly the same problem. A large molecule such as O=C(NC1C(O)C(OC2OC(C)C(OC3OC(C(O)C3O)C(OC4OC(C(O)COC5OC(C(O)C5O)C(OC6OC(C(O)C6O)C(OC7OC(COC8OC(COC9OC(COC%10OC(COC%11OC(COC%12OC(COC%13OC(COC%14OC(COC%15OC(COC%16OC(COC%17OC(COC%18OC(COC%19OC(COC%20OC(COC%21OC(COC%22OC(COC%23OC(COC%24OC(CO)C(O)C%24O)C(O)C%23O)C(O)C%22O)C(O)C%21O)C(O)C%20O)C(O)C%19O)C(O)C%18O)C(O)C%17O)C(O)C%16O)C(O)C%15O)C(O)C%14O)C(O)C%13O)C(O)C%12O)C(O)C%11O)C(O)C%10O)C(O)C9O)C(O)C8O)C(O)C7O)COC%25OC(C(O)C%25O)C(OC%26OC(C(O)C%26O)C(OC%27OC(COC%28OC(COC%29OC(COC%30OC(COC%31OC(COC%32OC(COC%33OC(COC%34OC(COC%35OC(COC%36OC(COC%37OC(COC%38OC(COC%39OC(COC%40OC(COC%41OC(COC%42OC(COC%43OC(COC%44OC(CO)C(O)C%44O)C(O)C%43O)C(O)C%42O)C(O)C%41O)C(O)C%40O)C(O)C%39O)C(O)C%38O)C(O)C%37O)C(O)C%36O)C(O)C%35O)C(O)C%34O)C(O)C%33O)C(O)C%32O)C(O)C%31O)C(O)C%30O)C(O)C%29O)C(O)C%28O)C(O)C%27O)COC%45OC(C(O)C%45O)C(OC%46OC(C(O)C%46O)C(OC%47OC(COC%48OC(COC%49OC(COC%50OC(COC%51OC(COC%52OC(COC%53OC(COC%54OC(COC%55OC(COC%56OC(COC%57OC(COC%58OC(COC%59OC(COC%60OC(COC%61OC(COC%62OC(COC%63OC(COC%64OC(CO)C(O)C%64O)C(O)C%63O)C(O)C%62O)C(O)C%61O)C(O)C%60O)C(O)C%59O)C(O)C%58O)C(O)C%57O)C(O)C%56O)C(O)C%55O)C(O)C%54O)C(O)C%53O)C(O)C%52O)C(O)C%51O)C(O)C%50O)C(O)C%49O)C(O)C%48O)C(O)C%47O)COC%65OC(C(O)C%65O)C(OC%66OC(C(O)COC%67OC(C(O)C%67O)C(OC=%68OC(C(O)C%68[O-])C(O)COC%69OC(C(O)C%69O)C(OC%70OC(C(O)COC%71OC(C(O)C%71O)C(OC%72OC(C(O)COC%73OC(C(O)C%73O)C(OC%74OC(C(O)COC%75OC(C(O)C%75O)C(OC%76OC(C(O)COC%77OC(C(O)C%77O)C(OC%78OC(C(O)COC%79OC(C(O)C%79O)C(OC%80OC(C(O)COC%81OC(C(O)C%81O)C(OC%82OC(C(O)COC%83OC(C(O)C%83O)C(OC%84OC(C(O)COC%85OC(C(O)C%85O)C(OC%86OC(C(O)CO)C(O)C%86O)CO)C(O)C%84O)CO)C(O)C%82O)CO)C(O)C%80O)CO)C(O)C%78O)CO)C(O)C%76O)CO)C(O)C%74O)CO)C(O)C%72O)CO)C(O)C%70O)CO)CO)C(O)C%66O)CO)CO)CO)CO)C(O)C4O)CO)C(O)C2O)C(OC1OP(=O)([O-])OP(=O)([O-])OCC(=CCCC(=CCCC(=CCCC(=CCCC(=CCCC(=CCCC(=CCCC(=CCCC(=CCCC(=CC)C)C)C)C)C)C)C)C)C)C)CO)C

will not work with SMILES_TO_MOLECULE, however i tried running this directly in openbabel with: obabel -:"O=C(NC1C(O)C(OC2OC(C)C(OC3OC(C(O)C3O)C(OC4OC(C(O)COC5OC(C(O)C5O)C(OC6OC(C(O)C6O)C(OC7OC(COC8OC(COC9OC(COC%10OC(COC%11OC(COC%12OC(COC%13OC(COC%14OC(COC%15OC(COC%16OC(COC%17OC(COC%18OC(COC%19OC(COC%20OC(COC%21OC(COC%22OC(COC%23OC(COC%24OC(CO)C(O)C%24O)C(O)C%23O)C(O)C%22O)C(O)C%21O)C(O)C%20O)C(O)C%19O)C(O)C%18O)C(O)C%17O)C(O)C%16O)C(O)C%15O)C(O)C%14O)C(O)C%13O)C(O)C%12O)C(O)C%11O)C(O)C%10O)C(O)C9O)C(O)C8O)C(O)C7O)COC%25OC(C(O)C%25O)C(OC%26OC(C(O)C%26O)C(OC%27OC(COC%28OC(COC%29OC(COC%30OC(COC%31OC(COC%32OC(COC%33OC(COC%34OC(COC%35OC(COC%36OC(COC%37OC(COC%38OC(COC%39OC(COC%40OC(COC%41OC(COC%42OC(COC%43OC(COC%44OC(CO)C(O)C%44O)C(O)C%43O)C(O)C%42O)C(O)C%41O)C(O)C%40O)C(O)C%39O)C(O)C%38O)C(O)C%37O)C(O)C%36O)C(O)C%35O)C(O)C%34O)C(O)C%33O)C(O)C%32O)C(O)C%31O)C(O)C%30O)C(O)C%29O)C(O)C%28O)C(O)C%27O)COC%45OC(C(O)C%45O)C(OC%46OC(C(O)C%46O)C(OC%47OC(COC%48OC(COC%49OC(COC%50OC(COC%51OC(COC%52OC(COC%53OC(COC%54OC(COC%55OC(COC%56OC(COC%57OC(COC%58OC(COC%59OC(COC%60OC(COC%61OC(COC%62OC(COC%63OC(COC%64OC(CO)C(O)C%64O)C(O)C%63O)C(O)C%62O)C(O)C%61O)C(O)C%60O)C(O)C%59O)C(O)C%58O)C(O)C%57O)C(O)C%56O)C(O)C%55O)C(O)C%54O)C(O)C%53O)C(O)C%52O)C(O)C%51O)C(O)C%50O)C(O)C%49O)C(O)C%48O)C(O)C%47O)COC%65OC(C(O)C%65O)C(OC%66OC(C(O)COC%67OC(C(O)C%67O)C(OC=%68OC(C(O)C%68[O-])C(O)COC%69OC(C(O)C%69O)C(OC%70OC(C(O)COC%71OC(C(O)C%71O)C(OC%72OC(C(O)COC%73OC(C(O)C%73O)C(OC%74OC(C(O)COC%75OC(C(O)C%75O)C(OC%76OC(C(O)COC%77OC(C(O)C%77O)C(OC%78OC(C(O)COC%79OC(C(O)C%79O)C(OC%80OC(C(O)COC%81OC(C(O)C%81O)C(OC%82OC(C(O)COC%83OC(C(O)C%83O)C(OC%84OC(C(O)COC%85OC(C(O)C%85O)C(OC%86OC(C(O)CO)C(O)C%86O)CO)C(O)C%84O)CO)C(O)C%82O)CO)C(O)C%80O)CO)C(O)C%78O)CO)C(O)C%76O)CO)C(O)C%74O)CO)C(O)C%72O)CO)C(O)C%70O)CO)CO)C(O)C%66O)CO)CO)CO)CO)C(O)C4O)CO)C(O)C2O)C(OC1OP(=O)([O-])OP(=O)([O-])OCC(=CCCC(=CCCC(=CCCC(=CCCC(=CCCC(=CCCC(=CCCC(=CCCC(=CCCC(=CC)C)C)C)C)C)C)C)C)C)C)CO)C" -omol and indeed it works. I'm using openbabel 2.3.2 and mysql 5.7.34

Molecules with less than ~1000 chars work fine, but after breaking this limit (it's not exact, but it's close to this) simply crashes MySQL with error log:

Version: '5.7.34-0ubuntu0.18.04.1'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  (Ubuntu)
==============================
*** Open Babel Warning  in WriteMolecule
  No 2D or 3D coordinates exist. Stereochemical information will be stored using an Open Babel extension. To generate 2D or 3D coordinates instead use --gen2D or --gen3D.
double free or corruption (!prev)
16:42:37 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=524288000
read_buffer_size=131072
max_used_connections=1
max_threads=151
thread_count=1
connection_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 572005 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f9d1c000d40
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f9d9c0b2e50 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xeb0beb]
/usr/sbin/mysqld(handle_fatal_signal+0x377)[0x775d57]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f9da8f78980]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f9da8274fb7]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f9da8276921]
/lib/x86_64-linux-gnu/libc.so.6(+0x89967)[0x7f9da82bf967]
/lib/x86_64-linux-gnu/libc.so.6(+0x909da)[0x7f9da82c69da]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x54c)[0x7f9da82cdf7c]
/usr/sbin/mysqld(_ZN11udf_handler7cleanupEv+0x39)[0x840509]
/usr/sbin/mysqld(_ZN13Item_udf_func7cleanupEv+0x18)[0x8405d8]
/usr/sbin/mysqld(_ZN11Query_arena10free_itemsEv+0x2d)[0xc1de4d]
/usr/sbin/mysqld(_ZN3THD19cleanup_after_queryEv+0x65)[0xc1ded5]
/usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x1c0)[0xc66f80]
/usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0xb26)[0xc67d66]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x220)[0xc69850]
/usr/sbin/mysqld(handle_connection+0x298)[0xd2f328]
/usr/sbin/mysqld(pfs_spawn_thread+0x154)[0x1216324]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f9da8f6d6db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f9da835771f]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f9d1c0058c0): SELECT smiles_to_molecule('O=C(NC1C(O)C(OC2OC(C)C(OC3OC(C(O)C3O)C(OC4OC(C(O)COC5OC(C(O)C5O)C(OC6OC(C(O)C6O)C(OC7OC(COC8OC(COC9OC(COC%10OC(COC%11OC(COC%12OC(COC%13OC(COC%14OC(COC%15OC(COC%16OC(COC%17OC(COC%18OC(COC%19OC(COC%20OC(COC%21OC(COC%22OC(COC%23OC(COC%24OC(CO)C(O)C%24O)C(O)C%23O)C(O)C%22O)C(O)C%21O)C(O)C%20O)C(O)C%19O)C(O)C%18O)C(O)C%17O)C(O)C%16O)C(O)C%15O)C(O)C%14O)C(O)C%13O)C(O)C%12O)C(O)C%11O)C(O)C%10O)C(O)C9O)C(O)C8O)C(O)C7O)COC%25OC(C(O)C%25O)C(OC%26OC(C(O)C%26O)C(OC%27OC(COC%28OC(COC%29OC(COC%30OC(COC%31OC(COC%32OC(COC%33OC(COC%34OC(COC%35OC(COC%36OC(COC%37OC(COC%38OC(COC%39OC(COC%40OC(COC%41OC(COC%42OC(COC%43OC(COC%44OC(CO)C(O)C%44O)C(O)C%43O)C(O)C%42O)C(O)C%41O)C(O)C%40O)C(O)C%39O)C(O)C%38O)C(O)C%37O)C(O)C%36O)C(O)C%35O)C(O)C%34O)C(O)C%33O)C(O)C%32O)C(O)C%31O)C(O)C%30O)C(O)C%29O)C(O)C%28O)C(O)C%27O)COC%45OC(C(O)C%45O)C(OC%46OC(C(O)C%46O)C(OC%47OC(COC%48OC(COC%49OC(COC%50OC(COC%51OC(COC%52OC(COC%53OC(COC%54OC(COC%55OC(COC%56OC(COC%57OC(COC%58OC(COC%59OC(COC%60OC(COC%61OC(COC%62OC(COC%63OC(COC%6
Connection ID (thread ID): 2
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Please let me know if you have any info on this problem, looks like a limitation from MySQL

paulo-maia commented 3 years ago

Problem persists with MariaDB 10.1.48, in this case the crash is not immediate, mysql tries to run the operation indefinitely and has to be killed.

paulo-maia commented 3 years ago

@Pansanel Problem can be solved by going to include/config.h.cmake and increasing the size of

/* Max length for a text or a blob */
/* #define MAX_VALUE_LENGTH 65536  */
/* Increasing to 10x the original length- check what makes sense here */
#define MAX_VALUE_LENGTH 655360
Pansanel commented 3 years ago

@paulo-maia Thank you for reporting this issue and helping to solve it! The value of MAX_VALUE_LENGTH was set several years ago with older version of MySQL. We had encounter issues with bigger values. It seems to be solved now. I have to check if there is a recommended limit for the initid->max_length parameter.

Pansanel commented 3 years ago

I found it: Maximum length of the result. For integers, the default is 21. For strings, the length of the longest argument. For reals, the default is 13 plus the number of decimals indicated by initid->decimals. The length includes any signs or decimal points. Can also be set to 65KB or 16MB in order to return a BLOB. The memory remains unallocated, but this is used to decide on the data type to use if the data needs to be temporarily stored.