mattb112885 / clusterDbAnalysis

ITEP - Integrated Toolkit for Exploration of microbial Pan-genomes
26 stars 15 forks source link

SQLite3 error: database or disk is full and no such table: main.blastnres_selfbit #76

Closed cvn001 closed 8 years ago

cvn001 commented 8 years ago

Hi ITEP devs,

I am running setup_step1.sh now. I have received an error message in early morning:

Error: near line 163: database or disk is full Error: near line 174: no such table: main.blastnres_selfbit Error: near line 175: no such table: main.blastnres_selfbit

However, the program is still running and the size of DATABASE.sqlite file is growing. There are no more errors.
I read the Known-issues in ITEP wiki and know that it could be the tmp partition in my server is full. Unfortunately, I still wonder about the reason and the severity of the last two errors. Do you think I should interrupt the program and need other configures, or just wait till the process ends?

Thanks

mattb112885 commented 8 years ago

Hi cvn,

My guess is that the tmp partition became full when the BLASTN results were being loaded (which explains why the table does not exist). If your tmp partition is filling on that step it could also have problems when performing other setup steps (especially step 3, if you choose to load the genome sequences into the database).

Best,

Matt

cvn001 commented 8 years ago

Hi Matt,

I have tried to make tmp partition bigger. After rerunning setup_step1.sh, these errors didn't show up. Then I tried to run setup_step2.sh and setup_step3.sh separately.

However, I got another some problems:

Running MCL on organisms in the groups file... Importing cluster information into database... db/flat_clusters:455552: expected 3 columns but found 5 - extras ignored db/flat_clusters:524649: expected 3 columns but found 1 - filling the rest with NULL Making a pre-built presencebsence table in the database... No user-specified genes are loaded in the database (if you have user-specified genes use setup_step5.sh to load them Traceback (most recent call last): File "/home/bioinfo/lxc/program/ITEP/src/internal/db_loadPresenceAbsence.py", line 96, in <module> if '\t' in myannote: TypeError: argument of type 'NoneType' is not iterable

I can not distinguish whether these messages are warnings or errors.

Please help me.

Thank you,

Xiangchen Li

mattb112885 commented 8 years ago

These are indeed errors (it should really stop execution when something like this happens). You still have cluster results loaded but the results for at least a couple of the clusters will be invalid, and you won't have a presence absence table loaded into the database. Once we fix this you should be able to run setup_step2.sh again and it will put the correct data in.

It looks like some formatting got messed up in one of your clustering results files. No idea how this happened, so I'll need some help from you. Lets start with the flat_clusters file ... Could you run the following two commands and let me know what is printed? (they will print for me lines 455552 and 524649 and the surrounding lines...)

$ cat -n db/flat_clusters | grep -A 1 -B 1 "455552" $ cat -n db/flat_clusters | grep -A 1 -B 1 "524649"

The error at the end was related to whatever happened with that flat cluster file -- there is at least one cluster with no genes in it, so the loading function got confused. I will fix that first.

Thanks and best,

Matt

cvn001 commented 8 years ago

Hi Matt

Thank you for your help.

line 455552: 455552 MRS_I_2.0_c_0.4_m_maxbit 5413 fig|68ig|68287.32.peg.MRS_I_2.0_c_0.4_m_maxbit 5413 fig|68287.27.peg.1516 line 524649: 524649 MRS_I_2.0_c_0.4_m_maxbfig|68287.11.peg87.61.peg.1399 Both two lines are strange.

When I use these commands to extract all homologous families: cat organisms | db_findClustersByOrganismList.py -y all_I_2.0_c_0.4_m_maxbit|db_getClusterGeneInformation.py|grep -F -f organisms|getClusterFastas.py core_gene, however, some organisms are disappeared in all gene families. So I guess there was something wrong.

Best,

Xiangchen

mattb112885 commented 8 years ago

Xiangchen:

It looks like something happened during the clustering and it didn't finish completely? Is there a incomplete gene name like fig|68 at the end of one of the files in your clusters/ directory?

Can you run these commands as well? This will let me know if the problem occurred before or during clustering.

This line will look for the weirdly formatted gene name in the gene info table. If all went well there shouldn't be a gene with this name in the file $ cat raw/68287.32.txt | grep 'fig|68ig' This will tell us if one of the blast results files got corrupted $ cat db/blastres_cat | grep 'fig|68ig' This will tell us if the cluster results file was corrupted or if it was only corrupted when converting to the flat format $ cat clusters/all_I_2.0_c_0.4_m_maxbit | grep 'fig|68ig'

I'll be hopefully able to give some better advice once I know the result of these tests.

Thanks and best

Matt

cvn001 commented 8 years ago

Hi Matt

I have found the point of the errors in my computer. My OS is Ubuntu 14.04 and I installed Sqlite3 just by using apt-get command. The version of the Sqlite is 3.8.

Yesterday I tried to download the latest version of Sqlite from SQLite Home Page. The newest version is 3.13.0. Then I reran the setup_step1.sh. I am glad to tell you that everything is OK. No error occured again. Finally, I get the pan-genome of my genomes by using ITEP.

So, I guess there are some bugs in Sqlite version 3.8 that it cannot handle with my large data and exists instability. Your codes are OK.

Thank you

Best

Xiangchen

mattb112885 commented 8 years ago

Hi Xiangchen,

Thank you for letting me know. That is frustrating, that was apparently a regression from the SQLite 3.7 branch... I have updated the known issues page accordingly. Feel free to open another issue if you run into other problems.

Thanks and Best,

Matt