FMAlign2 is a novel multiple sequence alignment algorithm based on FMAlign. It is designed to efficiently align ultra-long nucleotide sequences fast and accurately.
The program is supported both on Linux and Windows(Linux is strongly recommended for its convenience and better performance). Please make sure your computer meets the following requirements:
To compile the executable program for the entire project, please ensure that you have the make
command available.
make
availability: Open your command-line interface and type make --version
to check if the make
command is installed on your system. If it is installed, you will see the version information. If not, you need to install it before proceeding.make
on Windows: If you are using Windows, you may need to install the appropriate tool to enable make
functionality. One popular option is GNU Make for Windows , which provides a Windows-compatible version of make
. You can download it from the official website and follow the installation instructions.To compile the project, you need to have the g++
compiler available on your system. Here are the steps to ensure g++
support:
Check g++
availability: Open your command-line interface and type g++ --version
to check if the g++
command is installed. If it is installed, you will see the version information. If not, you need to install it before proceeding.
Install g++
on Windows: If you are using Windows, you can install g++
by using a compiler suite such as MinGW or Cygwin. These packages provide a Windows-compatible version of g++
along with other essential tools. You can download MinGW from the official website (https://mingw-w64.org/) or Cygwin from their official website (https://www.cygwin.com/). Follow the installation instructions provided by the respective package to set up g++
on your system.
Install g++
on Linux: On most Linux distributions, the g++
compiler is included as part of the GNU Compiler Collection (GCC). To install g++
, open your terminal and run the following command:
sudo apt-get install g++
This will install g++
and its dependencies on your system.
g++
installation: After installation, run g++ --version
again to verify that g++
is installed correctly and accessible from the command line. Please note that if you are a Windows user, make sure that the installed version(>4.2) of g++
supports OpenMP. On Windows systems, we utilize OpenMP for parallel computing.If you have ensured that your system meets the requirements mentioned above, you can proceed with the following steps to compile the executable file. However, you also have the option to directly use the pre-compiled executable file available in the Release.
DownLoad
git clone https://github.com/metaphysicser/FMAlign2.git
cd FMAlign2
# for Linux
chmod 777 ./ext/mafft/linux/usr/libexec/mafft/disttbfast
Build
cd FMAlign2 && make [M64=1]
Switch to the FMAlign2 directory in your terminal and execute the above command to build the project. We provide two compilation modes: 32-bit and 64-bit. In most cases, the 32-bit mode is sufficient to handle most data. However, if the concatenated length of all sequences exceeds the range of uint32_t (4294967295), you should add the M64 parameter when compiling the program to generate a 64-bit executable.
make
command.make M64=1
command.During the compilation process, please be patient as the time required depends on the size and complexity of the project.
Once the compilation is complete, you will find the generated executable file in the specified output directory.
Note: If you want to remove all the generated .o
files, you can execute the following command:
make clean
This command will clean up the intermediate object files and leave only the source code and executable file in the project directory. Use this command when you want to start a fresh build or clean up unnecessary files to save disk space.
Please note that if you choose halign2 and halign3 as your multiple sequence alignment methods, make sure you have Java environment installed. To check the version of Java installed on your system, you can open a command prompt or terminal and execute the following command:
java -version
This will display the installed Java version information.
If you don't have Java installed or if the installed version is not compatible, you can follow these steps to install Java:
To install Java on Windows:
java -version
to verify that Java is installed and the correct version is displayed.To install Java on Linux:
sudo apt update
to update the package lists on your system.sudo apt install default-jdk
to install the default version of OpenJDK.java -version
to verify that Java is installed and the correct version is displayed.Once you have Java installed and verified the version, you should be able to use halign2 and halign3 for multiple sequence alignment.
Reminder: Please ensure that all external files (such as MAFFT, HALIGN, etc.) are properly copied to their corresponding directories. Pay close attention to the relative paths between FMAlign2 and the ext folder to avoid issues during execution,
if you are Linux user:
./FMAlign2 -i /path/to/data [other options]
if you are Windows user:
./FMAlign2.exe -i /path/to/data [other options]
if you want to show the parameters details:
./FMAlign2 -h
Parameters Details:
We will demonstrate with the example data mt1x.fasta
, assuming you are running on a Linux system.
./FMAlign2 -i ./data/mt1x.fasta -l 20 -c 1 -p mafft -f gloabl -o output.fmaligned2.fasta
This command specifies the following options:
mt1x.fasta
located in the data
folder.output.fmaligned2.fasta
will be generated in the FMAlign2 directory.After running this command, you will obtain the aligned output in the output.fmaligned2.fasta
file.
If you want to evaluate the generated alignment results, you can run the sp.py
script (requires a Python environment) with the following parameters:
python sp.py --input output.fmalign2.fasta --match 0 --mismatch 1 --gap1 2 --gap2 2
This command will calculate and print the SP (Sum-of-Pairs) score for the multiple sequence alignment results. The --input
parameter specifies the input alignment file (output.fmalign2.fasta
in this case), and the --match
, --mismatch
, --gap1
, and --gap2
parameters define the scoring scheme for matches, mismatches, and gap penalties.
By running this command, you will obtain the SP score, which provides an evaluation of the alignment quality.
Data can be assessed in data fold. All the data is compressed using xz compression. Before using it, please decompress the files.
Here are the methods to decompress the files on different operating systems:
Decompressing on Linux:
Open the terminal.
Navigate to the directory where the compressed file is located.
Run the following command to decompress the file:
xz -d filename.xz
Replace filename.xz
with the name of the file you want to decompress.
Decompressing on Windows:
Please note that the decompressed files will occupy more disk space. Make sure you have enough disk space to store the uncompressed files.
If you need more data, you can visit http://lab.malab.cn/~cjt/MSA/datasets.html for more datasets.
FMAlign2 is supported by ZOU's Lab. If you have any suggestions or feedback, we encourage you to provide them through the issue page on the project's repository. You can also reach out via email to zpl010720@gmail.com.
We value your input and appreciate your contribution to improving the project. Thank you for taking the time to provide feedback, and we will address your concerns as soon as possible.
Pinglu Zhang, Huan Liu, Yanming Wei, Yixiao Zhai, Qinzhong Tian, Quan Zou, FMAlign2: a novel fast multiple nucleotide sequence alignment method for ultralong datasets, Bioinformatics, 2024;, btae014, https://doi.org/10.1093/bioinformatics/btae014