justincbagley / piranha

Scripts for file processing and analysis in phylogenetics and phylogeography
Other
11 stars 3 forks source link

Issue running trimSeqs function #6

Closed biologistico closed 4 years ago

biologistico commented 4 years ago

I'm having trouble getting trimSeqs to work. Here's the error I get:

(base) biologistico@DESKTOP-3PSOLC1:~/data_clean/phylip_clean$ piranha -f trimSeqs -h

piranha v1.1.4, July 2020  (main script for PIrANHA v0.4a3, update Jul 31 14:40:56 CDT 2020)
Copyright (c) 2019-2020 Justin C. Bagley. All rights reserved.
----------------------------------------------------------------------------------------------------------
INFO      | Fri Aug  7 12:54:27 CDT 2020 | Function: trimSeqs
INFO      | Fri Aug  7 12:54:27 CDT 2020 | Function arguments: -h
INFO      | Fri Aug  7 12:54:27 CDT 2020 | Checking machine type...
INFO      | Fri Aug  7 12:54:27 CDT 2020 | Found machine type Linux.
INFO      | Fri Aug  7 12:54:27 CDT 2020 | Checking file limits on Linux...
INFO      | Fri Aug  7 12:54:27 CDT 2020 |    ulimit: 1024
INFO      | Fri Aug  7 12:54:27 CDT 2020 | Execution path: /home/linuxbrew/.linuxbrew/Cellar/piranha/0.4a3/bin/trimSeqs
INFO      | Fri Aug  7 12:54:27 CDT 2020 | Executing function with -a flag arguments...
/home/linuxbrew/.linuxbrew/Cellar/piranha/0.4a3/bin/trimSeqs: 24: Bad substitution
/home/linuxbrew/.linuxbrew/Cellar/piranha/0.4a3/bin/trimSeqs: 35: [[: not found
Please find the file util.sh and add a reference to it in this script. Exiting...

12:54:27 PM [emergency] Exit trapped. In function: 'trapCleanup
piranha
main' Exiting.

I tried updating the paths to utils.sh, sharedFunctions.sh, and sharedVariables.sh files in the trimSeqs.sh file but it still gives me the same error. Is it because I also have to specify something on the SCRIPT_PATH line of the trimSeqs.sh script? Here's what I modified in that script so far shown in bold (2 asterisks):

# Provide a variable with the location of this script.
SCRIPT_PATH="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

# Source Scripting Utilities
# -----------------------------------
# These shared utilities provide many functions which are needed to provide
# the functionality in this boilerplate. This script will fail if they can
# not be found.
# -----------------------------------

UTILS_LOCATION="**/home/linuxbrew/.linuxbrew/Cellar/piranha/0.4a3/lib/utils.sh**" # Update this path to find the utilities.

if [[ -f "${UTILS_LOCATION}" ]]; then
  source "${UTILS_LOCATION}"
else
  echo "Please find the file util.sh and add a reference to it in this script. Exiting..."
  exit 1
fi

# Source shared functions and variables
# -----------------------------------

FUNCS_LOCATION="**/home/linuxbrew/.linuxbrew/Cellar/piranha/0.4a3/lib/sharedFunctions.sh**" # Update this path to find the shared functions.
VARS_LOCATION="**/home/linuxbrew/.linuxbrew/Cellar/piranha/0.4a3/lib/sharedVariables.sh**" # Update this path to find the shared variables.

I'm sure it's something minor but I can't seem to figure it out.

justincbagley commented 4 years ago

Hola Juan,

First, in terms of generalities, I have not been able to replicate your issue on my MacBook Pro, but I noticed that you are running Linux and have installed PIrANHA through linuxbrew, so this is likely a Linux- or Linux distribution-specific issue, which means that other Linux users could be experiencing something similar. However, Mac users should be fine.

The main problem is a bad substitution syntax error, which is means the wrong shell is being called. You should not have to manually change SCRIPT_PATH, UTILS_LOCATION, etc. Notice that trimSeqs is a shell script thus uses the shell shebang (#!/bin/sh). So sh is not getting called; instead your trimSeqs function is either being run by bash or dash.

If you are running on Ubuntu, the default shell used to be (prob still is) dash. Check this with

echo $SHELL

Check available shells on your machine with:

cat /etc/shells

So the best solution would be to reconfigure your machine to use bash as the default shell instead (e.g. see solutions here and here).

Another solution: you may also try changing the trimSeqs shebang from #!/bin/sh to #!/bin/bash, but not sure this will work.

Let me know if the above resolves your issue, or if you have any further questions.

Best, ~J

biologistico commented 4 years ago

Thanks Justin! My default shell was indeed bash but I could solve the problem changing #!/bin/sh to #!/bin/bash as you suggested. I'm running trimSeqs on my PHYLIP alignments now.

Will keep you posted,

Juan

justincbagley commented 4 years ago

Hi, The problem with

$ `piranha -f trimSeqs -m1 -o phylip -c 0.6 -g 0.1 -k 1

is the "`" mark at the beginning, after the shell prompt, which just accidentally was copied in from example code. In text or Markdown this mark signifies that what follows is code and thus it should not actually be entered at the command line. Removing that symbol should fix your problem.

biologistico commented 4 years ago

I just figured it out a little while ago but now ran into another issue apparently with the trapCleanup function:

(base) biologistico@DESKTOP-3PSOLC1:~/data_clean/phylip_clean$ piranha -f trimSeqs -m1 -o phylip -c 0.6 -g 0.1 -k 1

piranha v1.1.4, July 2020  (main script for PIrANHA v0.4a3, update Jul 31 14:40:56 CDT 2020)
Copyright (c) 2019-2020 Justin C. Bagley. All rights reserved.
----------------------------------------------------------------------------------------------------------
INFO      | Fri Aug  7 20:51:28 CDT 2020 | Function: trimSeqs
INFO      | Fri Aug  7 20:51:28 CDT 2020 | Function arguments: -m1 -o phylip -c 0.6 -g 0.1 -k 1
INFO      | Fri Aug  7 20:51:28 CDT 2020 | Checking machine type...
INFO      | Fri Aug  7 20:51:29 CDT 2020 | Found machine type Linux.
INFO      | Fri Aug  7 20:51:29 CDT 2020 | Checking file limits on Linux...
INFO      | Fri Aug  7 20:51:29 CDT 2020 |    ulimit: 1024
INFO      | Fri Aug  7 20:51:29 CDT 2020 | Execution path: /home/linuxbrew/.linuxbrew/Cellar/piranha/0.4a3/bin/trimSeqs
INFO      | Fri Aug  7 20:51:29 CDT 2020 | Executing function with -a flag arguments...
INFO      | Fri Aug  7 20:51:31 CDT 2020 |----------------------------------------------------------------
INFO      | Fri Aug  7 20:51:31 CDT 2020 | trimSeqs, v1.0.0 July 2020
INFO      | Fri Aug  7 20:51:31 CDT 2020 | Copyright (c) 2020 Justin C. Bagley. All rights reserved.
INFO      | Fri Aug  7 20:51:31 CDT 2020 |----------------------------------------------------------------
INFO      | Fri Aug  7 20:51:31 CDT 2020 | Starting trimSeqs analysis...
INFO      | Fri Aug  7 20:51:31 CDT 2020 | -----------------------------------
INFO      | Fri Aug  7 20:51:31 CDT 2020 | Step #1: Set up workspace, check machine type, and determine output file settings.
INFO      | Fri Aug  7 20:51:31 CDT 2020 | -----------------------------------
INFO      | Fri Aug  7 20:51:31 CDT 2020 | Starting input directory (using current dir):
INFO      | Fri Aug  7 20:51:31 CDT 2020 | /home/biologistico/data_clean/phylip_clean
INFO      | Fri Aug  7 20:51:31 CDT 2020 | Output format setting:                     phylip (def: fasta)
INFO      | Fri Aug  7 20:51:31 CDT 2020 | Avg. seq. identity threshold (st) setting: 0.98 (def: 0.98)
INFO      | Fri Aug  7 20:51:31 CDT 2020 | Log file:                                  trimSeqs_log.txt (def: trimSeqs_log.txt)
INFO      | Fri Aug  7 20:51:31 CDT 2020 | Verbose mode:                              1 (1, on; 0, off)
INFO      | Fri Aug  7 20:51:31 CDT 2020 | Debug mode:                                0 (0, off; 1, on)
INFO      | Fri Aug  7 20:51:31 CDT 2020 | -----------------------------------
INFO      | Fri Aug  7 20:51:32 CDT 2020 | Step #2: Run main program, trim sequences in one or multiple alignments using trimAl...
INFO      | Fri Aug  7 20:51:32 CDT 2020 | -----------------------------------
INFO      | Fri Aug  7 20:51:32 CDT 2020 | Looping through input PHYLIP alignments, and trimming sequences...
INFO      | Fri Aug  7 20:51:32 CDT 2020 | ----------------- Trimming file (1 / 561): Gene000000000003_cleaned_050.aln.phy

08:51:32 PM [emergency] Exit trapped. In function: 'trapCleanup
trimSeqs
main' Exiting.

08:51:32 PM [emergency] Exit trapped. In function: 'trapCleanup
piranha
main' Exiting.
justincbagley commented 4 years ago

Hola Juan,

I need more information. Please give me the output of all of the following:

cat /etc/os-release
# 
echo $SHELL
#
cat /etc/shells
#
trimal -h
#
pwd
#
ls -1t | head -25
biologistico commented 4 years ago

Thanks Justin! Here are the outputs

#1
(base) biologistico@DESKTOP-3PSOLC1:/$ cat etc/os-release
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
(base) biologistico@DESKTOP-3PSOLC1:/$

#2 
(base) biologistico@DESKTOP-3PSOLC1:~$ echo $SHELL
/bin/bash

#3 
(base) biologistico@DESKTOP-3PSOLC1:/$ cat etc/shells
# /etc/shells: valid login shells
/bin/sh
/bin/bash
/usr/bin/bash
/bin/rbash
/usr/bin/rbash
/bin/dash
/usr/bin/dash
/usr/bin/tmux
/usr/bin/screen
(base) biologistico@DESKTOP-3PSOLC1:/$

#4 
(base) biologistico@DESKTOP-3PSOLC1:~$ trimal -h
trimal: command not found

#5 
(base) biologistico@DESKTOP-3PSOLC1:~/data_clean/phylip_clean$ pwd
/home/biologistico/data_clean/phylip_clean

#6
(base) biologistico@DESKTOP-3PSOLC1:~/data_clean/phylip_clean$ ls -1t |head -25
Gene000000000003_cleaned_050.aln_trimal_log.txt
iqtree.sh
align3_iqtree.sh
Gene000002159647_cleaned_050.aln.phy
GeneAT2G28790_cleaned_050.aln.phy
Gene000000329881_cleaned_050.aln.phy
Gene000002231356_cleaned_050.aln.phy
Gene000000491233_cleaned_050.aln.phy
Gene000002288653_cleaned_050.aln.phy
Gene000002288317_cleaned_050.aln.phy
Gene000002014756_cleaned_050.aln.phy
Gene000000372933_cleaned_050.aln.phy
Gene000002340104_cleaned_050.aln.phy
Gene000002341256_cleaned_050.aln.phy
Gene000002287082_cleaned_050.aln.phy
Gene000002286943_cleaned_050.aln.phy
Gene000000370726_cleaned_050.aln.phy
Gene000000364661_cleaned_050.aln.phy
Gene000000245137_cleaned_050.aln.phy
Gene000000124721_cleaned_050.aln.phy
Gene000000373687_cleaned_050.aln.phy
Gene000000000095_cleaned_050.aln.phy
GeneAT3G56460_cleaned_050.aln.phy
GeneAT3G25660_cleaned_050.aln.phy
GeneAT1G15510_cleaned_050.aln.phy

I hope this helps.

Juan

justincbagley commented 4 years ago

From

#4
(base) biologistico@DESKTOP-3PSOLC1:~$ trimal -h
trimal: command not found

we can see that, at present, the issue is that trimAl is not installed and available from the command line as "trimal", as required by the trimSeqs function of piranha. Please install trimAl or alias its executable as "trimal" (i.e. point the executable to "trimal" using alias) in your "~/.bash_profile" or "~/.bashrc" file, source that file, and then restart your terminal and try running again in the working directory.

Best, ~J

biologistico commented 4 years ago

I installed trimAL but that didn't solve the issue and I got the same output:

(base) biologistico@DESKTOP-3PSOLC1:~/data_clean/phylip_clean$ piranha -f trimSeqs -m1 -o phylip -c 0.6 -g 0.1 -k 1

piranha v1.1.4, July 2020 (main script for PIrANHA v0.4a3, update Jul 31 14:40:56 CDT 2020) Copyright (c) 2019-2020 Justin C. Bagley. All rights reserved.

INFO | Mon Aug 10 14:47:42 CDT 2020 | Function: trimSeqs INFO | Mon Aug 10 14:47:42 CDT 2020 | Function arguments: -m1 -o phylip -c 0.6 -g 0.1 -k 1 INFO | Mon Aug 10 14:47:42 CDT 2020 | Checking machine type... INFO | Mon Aug 10 14:47:42 CDT 2020 | Found machine type Linux. INFO | Mon Aug 10 14:47:42 CDT 2020 | Checking file limits on Linux... INFO | Mon Aug 10 14:47:42 CDT 2020 | ulimit: 1024 INFO | Mon Aug 10 14:47:42 CDT 2020 | Execution path: /home/linuxbrew/.linuxbrew/Cellar/piranha/0.4a3/bin/trimSeqs INFO | Mon Aug 10 14:47:42 CDT 2020 | Executing function with -a flag arguments... INFO | Mon Aug 10 14:47:45 CDT 2020 |---------------------------------------------------------------- INFO | Mon Aug 10 14:47:45 CDT 2020 | trimSeqs, v1.0.0 July 2020 INFO | Mon Aug 10 14:47:45 CDT 2020 | Copyright (c) 2020 Justin C. Bagley. All rights reserved. INFO | Mon Aug 10 14:47:45 CDT 2020 |---------------------------------------------------------------- INFO | Mon Aug 10 14:47:45 CDT 2020 | Starting trimSeqs analysis... INFO | Mon Aug 10 14:47:45 CDT 2020 | ----------------------------------- INFO | Mon Aug 10 14:47:45 CDT 2020 | Step #1: Set up workspace, check machine type, and determine output file settings. INFO | Mon Aug 10 14:47:45 CDT 2020 | ----------------------------------- INFO | Mon Aug 10 14:47:45 CDT 2020 | Starting input directory (using current dir): INFO | Mon Aug 10 14:47:45 CDT 2020 | /home/biologistico/data_clean/phylip_clean INFO | Mon Aug 10 14:47:45 CDT 2020 | Output format setting: phylip (def: fasta) INFO | Mon Aug 10 14:47:45 CDT 2020 | Avg. seq. identity threshold (st) setting: 0.98 (def: 0.98) INFO | Mon Aug 10 14:47:45 CDT 2020 | Log file: trimSeqs_log.txt (def: trimSeqs_log.txt) INFO | Mon Aug 10 14:47:45 CDT 2020 | Verbose mode: 1 (1, on; 0, off) INFO | Mon Aug 10 14:47:45 CDT 2020 | Debug mode: 0 (0, off; 1, on) INFO | Mon Aug 10 14:47:46 CDT 2020 | ----------------------------------- INFO | Mon Aug 10 14:47:46 CDT 2020 | Step #2: Run main program, trim sequences in one or multiple alignments using trimAl... INFO | Mon Aug 10 14:47:46 CDT 2020 | ----------------------------------- INFO | Mon Aug 10 14:47:46 CDT 2020 | Looping through input PHYLIP alignments, and trimming sequences... INFO | Mon Aug 10 14:47:46 CDT 2020 | ----------------- Trimming file (1 / 561): Gene000000000003_cleaned_050.aln.phy grep: character class syntax is [[:space:]], not [:space:]

02:47:46 PM [emergency] Exit trapped. In function: 'trapCleanup trimSeqs main' Exiting.

02:47:46 PM [emergency] Exit trapped. In function: 'trapCleanup piranha main' Exiting. (base) biologistico@DESKTOP-3PSOLC1:~/data_clean/phylip_clean$

justincbagley commented 4 years ago

I just updated trimSeqs but I haven't created a new release so you can't automatically update it with brew update or a piranha upgrade. On Linux, go to the directory where the trimSeqs function is on your machine, which is listed in the "Execution path" in the program output, and for you is /home/linuxbrew/.linuxbrew/Cellar/piranha/0.4a3/bin/. Then manually edit the function. Do this as follows:

cd /home/linuxbrew/.linuxbrew/Cellar/piranha/0.4a3/bin/
sed -i 's/AverageIdentity\[\:space\:\]/AverageIdentity\[\[\:space\:\]\]/g' ./trimSeqs

This should fix your problem. In the meantime I will try to find time to do a new release containing this bug fix. Thanks. Best, ~J

biologistico commented 4 years ago

Mmm I did exactly that but didn't work either. Could there be a typo somewhere?

biologistico@DESKTOP-3PSOLC1:~/data_clean/phylip_clean$ cd /home/linuxbrew/.linuxbrew/Cellar/piranha/0.4a3/bin/ biologistico@DESKTOP-3PSOLC1:/home/linuxbrew/.linuxbrew/Cellar/piranha/0.4a3/bin$ sed -i 's/AverageIdentity[\:space\:]/AverageIdentity[[\:space\:]]/g' ./trimSeqs biologistico@DESKTOP-3PSOLC1:/home/linuxbrew/.linuxbrew/Cellar/piranha/0.4a3/bin$ cd /home/biologistico/data_clean/phylip_clean/ biologistico@DESKTOP-3PSOLC1:~/data_clean/phylip_clean$ piranha -f trimSeqs -m1 -o phylip -c 0.6 -g 0.1 -k 1

piranha v1.1.4, July 2020 (main script for PIrANHA v0.4a3, update Jul 31 14:40:56 CDT 2020) Copyright (c) 2019-2020 Justin C. Bagley. All rights reserved.

INFO | Wed Aug 12 10:31:09 CDT 2020 | Function: trimSeqs INFO | Wed Aug 12 10:31:09 CDT 2020 | Function arguments: -m1 -o phylip -c 0.6 -g 0.1 -k 1 INFO | Wed Aug 12 10:31:09 CDT 2020 | Checking machine type... INFO | Wed Aug 12 10:31:09 CDT 2020 | Found machine type Linux. INFO | Wed Aug 12 10:31:09 CDT 2020 | Checking file limits on Linux... INFO | Wed Aug 12 10:31:09 CDT 2020 | ulimit: 1024 INFO | Wed Aug 12 10:31:09 CDT 2020 | Execution path: /home/linuxbrew/.linuxbrew/Cellar/piranha/0.4a3/bin/trimSeqs INFO | Wed Aug 12 10:31:09 CDT 2020 | Executing function with -a flag arguments... INFO | Wed Aug 12 10:31:11 CDT 2020 |---------------------------------------------------------------- INFO | Wed Aug 12 10:31:11 CDT 2020 | trimSeqs, v1.0.0 July 2020 INFO | Wed Aug 12 10:31:11 CDT 2020 | Copyright (c) 2020 Justin C. Bagley. All rights reserved. INFO | Wed Aug 12 10:31:11 CDT 2020 |---------------------------------------------------------------- INFO | Wed Aug 12 10:31:11 CDT 2020 | Starting trimSeqs analysis... INFO | Wed Aug 12 10:31:11 CDT 2020 | ----------------------------------- INFO | Wed Aug 12 10:31:11 CDT 2020 | Step #1: Set up workspace, check machine type, and determine output file settings. INFO | Wed Aug 12 10:31:11 CDT 2020 | ----------------------------------- INFO | Wed Aug 12 10:31:12 CDT 2020 | Starting input directory (using current dir): INFO | Wed Aug 12 10:31:12 CDT 2020 | /home/biologistico/data_clean/phylip_clean INFO | Wed Aug 12 10:31:12 CDT 2020 | Output format setting: phylip (def: fasta) INFO | Wed Aug 12 10:31:12 CDT 2020 | Avg. seq. identity threshold (st) setting: 0.98 (def: 0.98) INFO | Wed Aug 12 10:31:12 CDT 2020 | Log file: trimSeqs_log.txt (def: trimSeqs_log.txt) INFO | Wed Aug 12 10:31:12 CDT 2020 | Verbose mode: 1 (1, on; 0, off) INFO | Wed Aug 12 10:31:12 CDT 2020 | Debug mode: 0 (0, off; 1, on) INFO | Wed Aug 12 10:31:12 CDT 2020 | ----------------------------------- INFO | Wed Aug 12 10:31:12 CDT 2020 | Step #2: Run main program, trim sequences in one or multiple alignments using trimAl... INFO | Wed Aug 12 10:31:12 CDT 2020 | ----------------------------------- INFO | Wed Aug 12 10:31:12 CDT 2020 | Looping through input PHYLIP alignments, and trimming sequences... INFO | Wed Aug 12 10:31:12 CDT 2020 | ----------------- Trimming file (1 / 561): Gene000000000003_cleaned_050.aln.phy

10:31:12 AM [emergency] Exit trapped. In function: 'trapCleanup trimSeqs main' Exiting.

10:31:12 AM [emergency] Exit trapped. In function: 'trapCleanup piranha main' Exiting.

justincbagley commented 4 years ago

Hi Juan, I trimmed your sequences using -c 0.6 -g 0.1 and the default 98% identity threshold in trimSeqs. I then added the folder from my trimSeqs run on your data to our Burmeistera-RG-Hyb-Seq GitHub repository (see the new Cleaned_PHYLIP_trimSeqs_Juan/ subfolder). We can discuss and I can redo this with other options if you like.

While this solves your immediate need to get results, it doesn't really solve the bug issue above that you experienced, which is clearly specific to Ubuntu since trimSeqs is working on macOS and CentOS Linux. I will continue working with you here on the bug and we can discuss your files in email now.

Best, ~J

biologistico commented 4 years ago

Thank you Justin! I'll download them now and move forward. I'll get back to you if I run into another issue.

biologistico commented 4 years ago

I finally discovered what the issue with this function was, at least on my computer. It only works for the trimAl v1.4 available on github (https://github.com/scapella/trimal) but I had installed the trimAl v1.2 from the very outdated trimal website (http://trimal.cgenomics.org/trimal).

justincbagley commented 4 years ago

Thanks, Juan. I am closing this issue and I will add a warning in the trimSeqs function / documentation that this function only works with trimAl v1.4+.