ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
170 stars 62 forks source link

Novoplasty on metagenomic data #229

Open pguenzi-tiberi opened 3 months ago

pguenzi-tiberi commented 3 months ago

Hello,

I used Novoplasty on metagenomic data. I used this file as a configuration file:

============

Project:

Project name = MS_assembly_chloroplast Type = chloro Genome Range = 200000-500000 K-mer = 33 Max memory = Extended log = 0 Save assembled reads = no Seed Input = /home/guenzitp/work_dir_bettik/results/Organelle_Assembly/rbcl_nucleotides.fasta Extend seed directly = yes Reference sequence = Variance detection = Chloroplast sequence =

Dataset 1:

Read Length = 300 Insert size = 100 Platform = illumina Single/Paired = PE Combined reads = Forward reads = /bettik/guenzitp/data/RQ/MiSeq/RQ_tr_1.fastq Reverse reads = /bettik/guenzitp/data/RQ/MiSeq/RQ_tr_2.fastq Store Hash =

Heteroplasmy:

MAF = HP exclude list = PCR-free =

Optional: Insert size auto = yes Use Quality Scores = no Reduce ambigious N's = Output path = /home/guenzitp/work_dir_bettik/results/Organelle_Assembly/MS_novoplasty_rbcl_nucleo_only_kmer33

=================

and here's the output file. I have a lot of small contigs and the total length is greater than the chloroplast size. Do you know why this doesn't work? I've tried various options (extended seed directly or not, kmer size = 20 or 33) but it never works.

===================


NOVOPlasty: The Organelle Assembler Version 4.3.1 Author: Nicolas Dierckxsens, (c) 2015-2020

Input parameters from the configuration file: Verify if everything is correct

Project:

Project name = MS_assembly_chloroplast Type = chloro Genome range = 200000-500000 K-mer = 33 Max memory = Extended log = 0 Save assembled reads = no Seed Input = /home/guenzitp/work_dir_bettik/results/Sanguina_Organelle_Assembly/rbcl_nucleotides.fasta Extend seed directly = yes Reference sequence = Variance detection = Chloroplast sequence =

Dataset 1:

Read Length = 300 Insert size = 100 Platform = illumina Single/Paired = PE Combined reads = Forward reads = /bettik/guenzitp/data/Sanguina/RQ/MiSeq/Sanguina_RQ_tr_1.fastq Reverse reads = /bettik/guenzitp/data/Sanguina/RQ/MiSeq/Sanguina_RQ_tr_2.fastq Store Hash =

Heteroplasmy:

Heteroplasmy = HP exclude list = PCR-free =

Optional:

Insert size auto = yes Use Quality Scores = Output path = /home/guenzitp/work_dir_bettik/results/Sanguina_Organelle_Assembly/MS_novoplasty_rbcl_nucleo_only_kmer33

Reading Input......OK

Building Hash Table......OK

Subsampled fraction: 99.96 % Forward reads without pair: 28207 Reverse reads without pair: 8711

Start Assembly...

------------Assembly 1 finished: Contigs are automatically merged in Merged_contigs file------------

Contig 01 : 9267 bp Contig 02 : 2783 bp Contig 03 : 2592 bp Contig 04 : 318 bp Contig 05 : 2615 bp Contig 06 : 889 bp Contig 07 : 694 bp Contig 08 : 3792 bp Contig 09 : 3554 bp Contig 10 : 3554 bp Contig 100 : 6996 bp Contig 101 : 1279 bp Contig 102 : 1279 bp Contig 103 : 4023 bp Contig 104 : 316 bp Contig 105 : 9135 bp Contig 106 : 9351 bp Contig 107 : 115 bp Contig 108 : 2122 bp Contig 109 : 2261 bp Contig 11 : 4729 bp Contig 110 : 301 bp Contig 12 : 311 bp Contig 13 : 4908 bp Contig 14 : 2016 bp (Check manually if the two contigs overlap to merge them together!) Contig 14 : 5068 bp Contig 15 : 2030 bp (Check manually if the two contigs overlap to merge them together!) Contig 15 : 5068 bp Contig 16 : 2210 bp (Check manually if the two contigs overlap to merge them together!) Contig 16 : 5068 bp Contig 17 : 1819 bp Contig 18 : 1671 bp Contig 19 : 1955 bp Contig 20 : 2164 bp Contig 21 : 2978 bp Contig 22 : 2978 bp Contig 23 : 5164 bp Contig 24 : 5164 bp Contig 25 : 739 bp Contig 26 : 2993 bp Contig 27 : 718 bp Contig 28 : 7175 bp Contig 29 : 682 bp Contig 30 : 6923 bp Contig 31 : 7001 bp Contig 32 : 10174 bp Contig 33 : 9985 bp Contig 34 : 9831 bp Contig 35 : 9606 bp Contig 36 : 9825 bp Contig 37 : 10987 bp (Check manually if the two contigs overlap to merge them together!) Contig 37 : 8759 bp Contig 38 : 6918 bp Contig 39 : 6918 bp Contig 40 : 6918 bp Contig 41 : 9074 bp Contig 42 : 307 bp (Check manually if the two contigs overlap to merge them together!) Contig 42 : 8998 bp Contig 43 : 3971 bp Contig 44 : 3810 bp Contig 45 : 308 bp (Check manually if the two contigs overlap to merge them together!) Contig 45 : 8998 bp Contig 46 : 1422 bp Contig 47 : 1590 bp Contig 48 : 12833 bp Contig 49 : 12704 bp Contig 50 : 2167 bp Contig 51 : 1204 bp Contig 52 : 11231 bp (Check manually if the two contigs overlap to merge them together!) Contig 52 : 8720 bp Contig 53 : 1034 bp Contig 54 : 11031 bp (Check manually if the two contigs overlap to merge them together!) Contig 54 : 12428 bp Contig 55 : 14986 bp Contig 56 : 14986 bp Contig 57 : 682 bp Contig 58 : 682 bp Contig 59 : 11231 bp (Check manually if the two contigs overlap to merge them together!) Contig 59 : 12428 bp Contig 60 : 7622 bp Contig 61 : 390 bp Contig 62 : 4275 bp (Check manually if the two contigs overlap to merge them together!) Contig 62 : 388 bp Contig 63 : 4452 bp Contig 64 : 8229 bp Contig 65 : 8450 bp Contig 66 : 10697 bp Contig 67 : 10904 bp Contig 68 : 3394 bp Contig 69 : 3590 bp Contig 70 : 661 bp Contig 71 : 112 bp Contig 72 : 429 bp Contig 73 : 5990 bp Contig 74 : 6251 bp Contig 75 : 18412 bp Contig 76 : 18319 bp Contig 77 : 8140 bp Contig 78 : 6735 bp Contig 79 : 5968 bp Contig 80 : 18785 bp Contig 81 : 3224 bp Contig 82 : 4996 bp Contig 83 : 608 bp Contig 84 : 3531 bp Contig 85 : 4926 bp Contig 86 : 479 bp Contig 87 : 302 bp Contig 88 : 2937 bp Contig 89 : 3119 bp Contig 90 : 1953 bp Contig 91 : 1202 bp Contig 92 : 1790 bp Contig 93 : 1994 bp Contig 94 : 4339 bp Contig 95 : 8157 bp Contig 96 : 1328 bp Contig 97 : 2451 bp Contig 98 : 2660 bp Contig 99 : 6817 bp

Total contigs : 120 Largest contig : 18785 bp Smallest contig : 112 bp Average insert size : 100 bp

-----------------------------------------Input data metrics-----------------------------------------

Total reads : 70768916 Aligned reads : 1282980 Assembled reads : 240094


Thank you for using NOVOPlasty!

Thank you for your time!

ndierckx commented 2 months ago

Hi, It is not meant for metagenomic datasets, are there multiple similar chloroplast genomes or what kind of dataset you have and what do you expect to get?