h3abionet / HPCBio-Refgraph_pipeline

0 stars 6 forks source link

QC outputs #11

Open cjfields opened 3 years ago

grendon commented 3 years ago

Most assemblies have metrics similar to this one:

---------------- Information for assembly 'NA19028.final.megahit_results/final.contigs.fa' ----------------

                                         Number of scaffolds        119
                                     Total size of scaffolds      91508
                                            Longest scaffold       4962
                                           Shortest scaffold        310
                                 Number of scaffolds > 1K nt         22  18.5%
                                Number of scaffolds > 10K nt          0   0.0%
                               Number of scaffolds > 100K nt          0   0.0%
                                 Number of scaffolds > 1M nt          0   0.0%
                                Number of scaffolds > 10M nt          0   0.0%
                                          Mean scaffold size        769
                                        Median scaffold size        601
                                         N50 scaffold length        747
                                          L50 scaffold count         34
                                                 scaffold %A      29.98
                                                 scaffold %C      20.00
                                                 scaffold %G      20.43
                                                 scaffold %T      29.59
                                                 scaffold %N       0.00
                                         scaffold %non-ACGTN       0.00
                             Number of scaffold non-ACGTN nt          0

                Percentage of assembly in scaffolded contigs       0.0%
              Percentage of assembly in unscaffolded contigs     100.0%
                      Average number of contigs per scaffold        1.0
Average length of break (>25 Ns) between contigs in scaffold          0

                                           Number of contigs        119
                              Number of contigs in scaffolds          0
                          Number of contigs not in scaffolds        119
                                       Total size of contigs      91508
                                              Longest contig       4962
                                             Shortest contig        310
                                   Number of contigs > 1K nt         22  18.5%
                                  Number of contigs > 10K nt          0   0.0%
                                 Number of contigs > 100K nt          0   0.0%
                                   Number of contigs > 1M nt          0   0.0%
                                  Number of contigs > 10M nt          0   0.0%
                                            Mean contig size        769
                                          Median contig size        601
                                           N50 contig length        747
                                            L50 contig count         34
                                                   contig %A      29.98
                                                   contig %C      20.00
                                                   contig %G      20.43
                                                   contig %T      29.59
                                                   contig %N       0.00
                                           contig %non-ACGTN       0.00
                               Number of contig non-ACGTN nt          0

I still see some scaffolds that could be removed because they are artifacts like this one with low complexity sequences:

>k141_37 flag=1 multi=5.0000 len=359
CGGGGAGAGGGGGGTAGAAGTGGGAGGAGGGAGAAACAGAAAAAAAGAGAGAGAAAAACAAAGAGGTGAGAGGGAGGAGAGAGACAGAGGGAGAGAGGTGAGGGGGAGAGAAACAGAGAAAATGGGAGGTGGAGGGGAGAGAGAGAGGAGAGAGAGAAACAGAGGGAGAGAGAGAGGTGGGGGAGAGACAGGAGAGAGAGGTAAGCGGGGAGAGAGAAAAACAGGGAGAGAGGTTGGGGGTTGAGGGAGAGACAGAGAAACAGGGAGAGAGAGGCGGGAAGAGGTGGGAGAAGACACAGAAAAAACAGAGAAAATGAGAAAGAAAAGAGACAGGGTGGGGGAGAGAGAGAGGGAGAGAG
cjfields commented 3 years ago

We will have a separate QC workflow for these steps.