Closed mmpust closed 2 years ago
Okay, I solved the error. You have to provide the full path plus FASTA file for -i and -o parameters. Like this:
-i /home/input/path/FASTA.fa
You could modify the parameter description.
Thanks for developing the tool!
Marie
If I now add additional (optional) parameters, the same error message appears
python /home/programs/ALFATClust/main/alfatclust.py \
-i /home/seq_file_path/fasta.fa \
-o /home/output/fasta_clust.fa \
--seed 1 \
--evaluate /home/evaluate/evaluate.csv
Error message
---------------------------------------------
Estimated similarity range = [0.95, 0.75]
Estimated similarity step size = 0.025
Default DNA k-mer size = 17
Default protein k-mer size = 9
Default DNA sketch size = 2000
Default protein sketch size = 2000
Min. estimated similarity considered = 0.55
No. of threads = 16
---------------------------------------------
Validating input sequence file '/home/seq_file_path/fasta.fa' ..
Pre-clustering sequences into subsets...
38526 individual subsets to be clustered
Process aborted due to error occurred: Unknown format fasta
Do you have an idea why this is happening? If I run the same without parameter 'evaluate', it runs well and the final output is:
---------------------------------------------
Estimated similarity range = [0.95, 0.75]
Estimated similarity step size = 0.025
Default DNA k-mer size = 17
Default protein k-mer size = 9
Default DNA sketch size = 2000
Default protein sketch size = 2000
Min. estimated similarity considered = 0.55
No. of threads = 16
---------------------------------------------
Validating input sequence file '/home/seq_file_path/fasta.fa' ..
Pre-clustering sequences into subsets...
38539 individual subsets to be clustered
38539 / 38539 subsets processed
Process completed. No. of sequence clusters = 45856
Output file
#Cluster 1
k141_14861_1 # 2 # 136 # -1 # ID=5400_1;partial=10;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.407
#Cluster 2
k141_33498_1 # 1 # 384 # -1 # ID=5402_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.490
#Cluster 3
k141_33499_1 # 1 # 393 # 1 # ID=5403_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.412
But where are the clustered sequences?
Hi! Knowing that you run ALFATClust in conda, we added a conda environment so that users may easily create their own without worrying the package compatibility. We also updated the sequence cluster evaluation module to fix an error that may occur in some scenarios.
We recommend that you create your conda environment using our environment file. Please let us know if the problem persists after update. Thanks.
I am trying to run ALFATClust with a FASTA.fa (DNA) input file:
I get the following error:
What is the problem? Thanks, Marie