mattb112885 / clusterDbAnalysis

ITEP - Integrated Toolkit for Exploration of microbial Pan-genomes
26 stars 15 forks source link

Contig names are incorrect for genbank files with multiple contigs [Regression] #66

Closed mattb112885 closed 10 years ago

mattb112885 commented 10 years ago

Unfortunately if you were getting a lot of "BAD" messages when processing the genbank files for input into ITEP, it might have been due to a bug I introduced trying to fix another bug. It will make the contig associations for genes in the database incorrect. Here's how to fix it:

1: Get the latest version of ITEP (using git pull origin master), which has the bug fixed. 2: Re-run convertGenbankToTable.py on the original genbank files, but using --replace to replace the tab-delimited files with corrected versions. Make sure you use the same version numbers (-v) as you originally used to load the genbank files. 2a: You might want to back up your database file (db/DATABASE.sqlite) in case something happens. 3: Re-run setup_step1.sh [note - it will not need to re-run BLAST, all it has to do is reload the database]. It will still take some time to run, I suggest running it overnight. 4: Re-run the other setup scripts (setup_step2, setup_step3 and setup_step4). It will not need to re-run the clustering or RPSBlast but you need to re-run these scripts to reestablish correct links between tables in the database.

Sorry about the inconvenience...