A satDNA analysis pipeline.
SatXplor is configured to run on any Linux system. However there are a couple of dependencies that need to be installed before successful running of the pipeline.
Since SatXplor is a compilation of over 10 individual scripts, and a Rust binary, the most elegant way of distributing and running EuSatXplor is by directly clonning the repository and installing the dependancies.
Clone the repository:
git clone https://github.com/mvolar/SatXplor.git
Navigate to the project directory:
cd SatXplor
Create and activate a virtual environment (optional but HIGHLY recommended):
python -m venv venv
source venv/bin/activate
Install python dependencies:
pip install -r requirements.txt
Install R dependencies:
Since R package manager in linux requires compilation of many packages, the installation time for all can take up to 20 minutes. Thus it is best to create a conda/mamba virtual environment and use the precompiled R packages for your Linux distirbution:
mamba create -n myenv r-base=4.3.3 -c conda-forge -y
mamba activate myenv
mamba install -c conda-forge r-biocmanager r-ggplot2 r-data.table r-dplyr r-umap r-stringr r-factominer r-ape r-optparse r-htmlwidgets r-igraph r-networkd3 r-circlize r-pheatmap r-scales bioconda::bioconductor-biostrings bioconda::bioconductor-complexheatmap -y
(Optional) Install both MAFFT and NCBI-BLAST through mamba:
mamba activate myenv
mamba install -c conda-forge -c bioconda mafft blast
If your running older versions of Linux systems You need to manually compile the new Rust binary on your older system by downloading Rust:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Restart your shell then navigate to the satxplor/satxplor/executables/
folder then:
cargo build --release
cp /target/release/kmer_edge_finder ./
and application should run normally.
Alternative to the normal installation SatXplor also comes with a docker container:
docker pull mvolaric/satxplor
/mnt/data
docker run -it -v path/to/your/data_folder:/mnt/data satxplor
run_config.json
by using the provided helper script setup_docker_run.py
:python satxplor/setup_docker_run.py --input_genome_path genome.fasta \
--sat_raw satellites.fasta \
--final_results_dir output_folder
controller.py
from the interactive shell. After the run the results should be visible in /path/to/your/data_folder/output_folder
directory.python satxplor/controller.py
docker run -it -v path/to/your/data_folder:/mnt/data satxplor
python satxplor/run_full_tests.py
cp ./testing_data /mnt/data/
## Usage
Running SatXplor is simple, you just edit the `run_config.json` file:
```json
{
"INPUT_GENOME_PATH": "path/to/your/genome",
"GENOME_PATH": "./input.fasta", #this is the location of the temporary copy
"SAT_RAW": "path/to/your/sats",
"SAT_FASTA_PATH": "./sats.fasta", #this is the location of the temporary copy
"FINAL_RESULTS_DIR": "/path/to/final/results/dir",
"OVERWRITE": true
}
And just run the main controller.py
file which then runs and outputs the results in the FINAL_RESULTS_DIR
and SatXplor/results
. Note that both python env and mamba envirnoments need to be active
(mypyenv) (myenv) python satxplor/controller.py
To check if everything is installed correctly run the following scripts. The script searches for external dependencies (MAFFT, BLAST) as well as tests for all R import libraries.
python tests/tests.py
SatXplor also ships whith a small testing sample to see if everything runs normally:
python satxplor/run_all_tests.py
which runs on the testing_data/testing_data.tar.gz
files and produces the normal output of running SatXplor.
If you use SatXplor please cite the following paper:
SatXplor – A comprehensive pipeline for satellite DNA analyses in complex genome assemblies
Marin Volarić, Nevenka Meštrović, Evelin Despot-Slade
bioRxiv 2024.08.09.607335; doi: https://doi.org/10.1101/2024.08.09.607335