Closed o-william-white closed 4 years ago
Hi Oliver, Afin did not run. That would put your spades contigs together. Check that you have things correctly installed (or more likely that the needed libs for afin are installed). afin needs a c++ complier with c++11 support and zlib.h. I use the GNU compile 4.8.4 (at least). You can check /data/home/mpx469/software/Fast-Plast/afin/afin to make sure it is working.
Hello,
Thanks for the reply. I thought it might have something to do with my installation.
I could get a help message from afin so I thought it was working ok
/data/home/mpx469/software/Fast-Plast/afin/afin -h
Usage: /data/home/mpx469/software/Fast-Plast/afin/afin -c contigsfile(s) -r readsfile(s) [-o outfile] [-m sort_char] [-s sub_len]
[-l search_loops] [-i min_cov] [-p min_overlap] [-t max_threads]
[-d initial_trim] [-e max_missed] [-f stop_ext] [-g mismatch] [-x extend_len]
[--silent] [--no_log] [--no_fusion] [--verbose] [--print_fused]
/data/home/mpx469/software/Fast-Plast/afin/afin -h [--help]
-c,--contigsfiles Space (or comma) separated list of files containing contigs
-r,--readsfiles Space (or comma) separated list of files containing reads
-o,--outfile Output will be printed to the outfile specified, with a .fa extension for the contigs and .log extension for the logfile
-m,--sort_char [default: 4] Sorts the reads by the first max_sort_char characters
-s,--sub_len [default: 100] Will focus on the current last contig_sub_len characters of the contig in each search
-l,--search_loops [default: 10] Will search against each contig a maximum of max_search_loops times to attempt extension
-i,--min_cov [default: 3] Will stop adding bp's once the coverage falls below min_cov
-p,--min_overlap [default: 20] Only those reads overlapping the contig by at least min_overlap bp's will be returned in each search
-t,--max_threads [default: 4] Will only run max_threads threads at a time
-d,--initial_trim [default: 0] Length to trim off the beginning and end of each contig at the start of the program
-e,--max_missed [default: 5] Maximum allowable mismatched bp's for each read when checking troubled contig fusions
-f,--stop_ext [default: .5] During extension, if the percentage of reads remaining after cleaning is below stop_ext, do not extend here
-g,--mismatch [default: .1] maximum percentage of mismatches allowed when fusing two contigs
-x,--extend_len [default: 40] Will add a max of extend_len bp's each search loop
--silent Suppress screen output
--no_log Suppress log file creation
--no_fusion Only extend, no attempt will be made to fuse contigs
--verbose Output additional information to logfile and/or screen (except if output to that location is suppressed)
--print_fused Print to file (_fused.fasta) fused contigs just before fusion, for inspecting the fusion locations
I initially had some trouble with my installation so I specified the paths in the perl script manually. See below:
grep "###directories" -A 13 /data/home/mpx469/software/Fast-Plast/fast-plast.pl
###directories
my $FPROOT = "$FindBin::RealBin";
my $AFIN_DIR = "$FPROOT/afin";
my $COVERAGE_DIR = "$FPROOT/Coverage_Analysis";
my $FPBIN = "$FPROOT/bin";
my $TRIMMOMATIC="/data/home/mpx469/software/Trimmomatic/Trimmomatic-0.39//trimmomatic-0.39.jar"; #path to trimmomatic executable
my $BOWTIE2="/share/apps/centos7/bowtie2/2.3.4/bin/bowtie2"; #path to bowtie2 executable
my $SPADES="/share/apps/centos7/spades/3.11.1/bin/spades.py"; #path to spades executable
my $BLAST="/share/apps/centos7/blast+/2.7.1/bin/"; #path to blast executable
my $SSPACE="/data/home/mpx469/software/sspace_basic/SSPACE_Basic.pl/SSPACE_Basic.pl"; #path to sspace exectuable
my $BOWTIE1="/share/apps/centos7/bowtie/1.2.0/bin/bowtie/bowtie"; #path to bowtie1 executable
my $JELLYFISH="/share/apps/centos7/jellyfish/2.2.6/bin/jellyfish/jellyfish"; #path to jellyfish2 excecutable
$ENV{'PATH'} = $PATH.':'.$BOWTIE1;
Does the input data need to be in the same directory as the script? Perhaps it wasn't able to find $FPROOT?
I am running the script in a different directory with the following input. perhaps I am missing a specific library as you mentioned above
module load perl
module load tbb/2018_U2
module load gcc
export PATH=/data/home/mpx469/software/Fast-Plast/:$PATH
fast-plast.pl -1 fastq-dump-Bedadeti1-r1.fq.gz -2 fastq-dump-Bedadeti1-r2.fq.gz --name Bedadeti1 --bowtie_index Zingiberales --coverage_analysis --clean light --threads 12
I am not sure about the libraries you mentioned so perhaps I should get in touch with my university support
Best wishes Oliver
Hello again,
Just to update you, I installed zlib locally and provided the path to the directory in my script as follows
export PATH=/data/home/mpx469/software/zlib/zlib-1.2.11/:$PATH
However, I got the same error. If I wanted to run afin independently to check it is working ok, is it possible to do this using the output I have already generated?
For example if I ran the following command, what would I input as the reads files?
/data/home/mpx469/software/Fast-Plast/afin/afin -c Bedadeti1/4_Afin_Assembly/filtered_spades_contigs.fsa -r <readsfiles>
Best wishes Ollie
Hi Ollie, If you run afin like this and use full paths to the files, you should be able to tell if it is working. /data/home/mpx469/software/Fast-Plast/afin/afin -c filtered_spades_contigs.fsa -r ../1_Trimmed_Reads/Bedadeti1.trimmed* -l 50 -f .1 -d 100 -x 1065 -p 10 -i 1 -o Bedadeti1_afin.
Data does not need to be in the same directory as Fast-Plast.
All outfiles (not just the error) should be looked at. Some of these programs print useful information to STDOUT and STDERR.
Best, Michael
Hi Michael,
Using the afin command you suggested I found that this step was quite memory intensive and it was higher than the limit set on my university computing facility. When I increase the limit it runs without issue.
Many thanks for your thoughts and sharing the software
Best wishes Ollie
Glad it worked out. Afin can be memory intensive; it depends on the data being used since it reads it all into memory.
Best, Michael
Hello,
Not really an issue with the program itself but I was wondering I might be able to get some feedback on a plastome assembly based on downloaded SRA data.
It seems to run for most of the script without issue. However, I get a message saying that it could not properly orientate the genome and I wanted to check that it was not due to an error on my part, or how I might optimise the assembly.
I ran the assembly with the command as follows
Below is the Fast-Plast_Progress.log
I checked the file /data/scratch/mpx469/fast-plast/Bedadeti1/Final_Assembly/Bedadeti1_afin_iter2.fa but found it was empty.
Below is the results_error.log file
Note that I found three filtered contigs in the directory 4_Afin_Assembly/filtered_spades_contigs.fsa. I annotated these contigs using GeSeq and these look like the LSC, IR and SSC.
Was fast-plast unable to complete the assembly simply because the data to link the contigs was missing? Or is there a way I can optimise the assembly?
Best wishes Oliver