mtisza1 / Cenote-Taker2

Cenote-Taker2: Discover and Annotate Divergent Viral Contigs (Please use Cenote-Taker 3 instead)
MIT License
56 stars 7 forks source link

Core dump #13

Closed KailunZM closed 2 years ago

KailunZM commented 3 years ago

Hello!

Thank you for sharing this powerful tool.

We just got Cenote-Taker2 installed in the system. I tried to run it using the test dataset "testcontigs_DNA_ct2.fasta" but always got a problem that there was no circular viral sequences or DTRs were detected. Instead, it has "core" file generated. I was running it in terminal. Here is the running script:

time ${CENOTE_BASE}/run_cenote-taker2.py \ -c ${indir}/testcontigs_DNA_ct2.fasta \ -r ${outdir} \ -p True \ -m 30 \ -t ${SLURM_CPUS_PER_TASK}

In the slurm file it says "ESC[35mFile with .fasta extension detected, attempting to keep contigs over 1000 nt and find circular sequences with apc.plESC(BESC[0m No circular contigs detected. "

I'm not sure if anyone else also had this problem. Could you help me figure it out? Any suggestions would be greatly appreciated!

Best, Kailun

mtisza1 commented 3 years ago

Kailun,

Thanks for opening this issue, and I'm sorry you are having an issue running/installing the tool. I have a few ideas what might be going wrong. However, if what I'm suggesting doesn't fix the problem, please attach a log file (or copy/paste the terminal output) from a run that fails in this way. A log file of the installation could be helpful as well.

1) Before running Cenote-Taker 2 did you activate the conda environment? i.e. conda activate cenote-taker2_env 2) bioawk or perl may not have installed correctly. What happens when you type:

conda activate cenote-taker2_env
which bioawk
which perl

3) is your variable "${indir}" the same as "${CENOTE_BASE}"? The test contigs should be in the same directory as the run_cenote-taker2.py script.

Hopefully we can get this resolved for you.

Best,

Mike

KailunZM commented 3 years ago

Hi Mike,

I really appreciate your quick reply. Here is my answer and please check the attached log file. The installation log file is too big to be attached here I guess.

  1. the script is activating the environment with conda activate /opt/apps/labs/gdlab/envs/cenote-taker2/2.1.3/cenote-taker2_env

  2. Here's the output /opt/apps/labs/gdlab/envs/cenote-taker2/2.1.3/cenote-taker2_env/bin/bioawk /opt/apps/labs/gdlab/envs/cenote-taker2/2.1.3/cenote-taker2_env/bin/perl

  3. yes

Best, Kailun

output.log

KailunZM commented 3 years ago

In addition, I used 12 CPUs previously. Though I'm not sure if it matters a lot, I try to used 16 CPUs now and it can identify DTRs but with less hallmark gene detected. By taking a look at the genome map, there is no hypothetical protein sequence information.

test_DNA_ct1

The log file of this run is attached below.

output.log

mtisza1 commented 3 years ago

Kailun,

Thanks for following up. I am unsure why you had the "No circular contigs detected" problem when using 12 CPUs, but not when you used 16 CPUs. The first output.log file seems to just end without the script finishing, so I'm struggling to understand the issue. Is this still happening?

Regarding your other problem with hypothetical proteins not being annotated, it looks like your HHsuite tool was not installed/compiled correctly.

You can confirm that hh-suite is not working by doing this (if it is working, you'll get a standard hhsearch help message):

cd Cenote-Taker2
hh-suite/build/src/hhsearch -h

Start an interactive job with 4 or more CPUs, if necessary. Check if hh-suite repo was cloned properly during the initial install of Cenote-Taker 2:

cd Cenote-Taker2
git clone https://github.com/soedinglab/hh-suite.git

If you get a message like "hh-suite already exists", then do:

cd hh-suite
git pull
mkdir -p build && cd build
cmake -DCMAKE_INSTALL_PREFIX=. ..
make -j 4 && make install
export PATH="$(pwd)/bin:$(pwd)/scripts:$PATH"

Please let me know if this fixes the problem!

Mike

KailunZM commented 3 years ago

Hello Mike.

Thank you for your advice. It looks like hh-suite is working. When I run hhsearch I got the standard help message. Also, I realized that the missing hypothetical protein annotation issue only happened in the circular contigs, like the test DNA contig1. It's been totally normal for linear sequences. So I want to ask about do you have an idea if that is because of this specific sequence. What would be the differences when annotating the circular and linear contigs?

By the way, right now we think the core dump issue is probably due to the differences between the nodes we used to install the Cenote-Taker2 and run it. Though we are not sure.

Thank you again for you help!

Best, Kailun