Open devonorourke opened 6 years ago
Hi, Can you please provide the version used and the full log file? The input you provided is all it need to perform a fast scaffolding; assembly graph is optional to improve its accuracy.
Thanks for the reply.
version
Japsa: A Java Package for Statistical Sequence Analysis
Version 1.7-05b, Built on Mon Jul 09 15:42:09 EDT 2018 with javac 1.8.0_144
Log file from stdout:
[main] WARN japsa.tools.bio.np.NPScarfCmd - Not found any legal SPAdes output folder, assembly graph thus not included!
#Sort list of bridges
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at java.base/java.lang.System.arraycopy(Native Method)
at java.base/java.util.Arrays.copyOfRange(Arrays.java:4030)
at japsa.seq.Sequence.subSequence(Sequence.java:272)
at japsa.bio.hts.scaffold.ContigBridge$Connection.filling(ContigBridge.java:1201)
at japsa.bio.hts.scaffold.Scaffold.viewSequence(Scaffold.java:438)
at japsa.bio.hts.scaffold.ScaffoldGraph.printSequences(ScaffoldGraph.java:965)
at japsa.tools.bio.np.NPScarfCmd.main(NPScarfCmd.java:291)
and the main log file from the program:
========================== START =============================
contig 0 ======> 0 57033 contig_0
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 1 ======> 0 11664 contig_1
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 2 ======> 0 3924 contig_2
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 3 ======> 0 2163 contig_3
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 4 ======> 0 11109 contig_4
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 5 ======> 0 3597 contig_5
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 6 ======> 0 46348 contig_6
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 7 ======> 0 9902 contig_7
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 8 ======> 0 28300 contig_8
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 9 ======> 0 19290 contig_9
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 10 ======> 0 35361 contig_10
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 11 ======> 0 2978 contig_11
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 12 ======> 0 13273 contig_12
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 13 ======> 0 64590 contig_13
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 14 ======> 0 22464 contig_14
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 15 ======> 0 21664 contig_15
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 16 ======> 0 3191 contig_16
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 17 ======> 0 46376 contig_17
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 18 ======> 0 15823 contig_18
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 19 ======> 0 36682 contig_19
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 20 ======> 0 2301 contig_20
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 21 ======> 0 38428 contig_21
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 22 ======> 0 112504 contig_22
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 23 ======> 0 18737 contig_23
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 24 ======> 0 2478 contig_24
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 25 ======> 0 2096 contig_25
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 26 ======> 0 21758 contig_26
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 27 ======> 0 1767 contig_27
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 28 ======> 0 1488 contig_28
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 29 ======> 0 148562 contig_29
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 30 ======> 0 13734 contig_30
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 31 ======> 0 15223 contig_31
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 32 ======> 0 31993 contig_32
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 33 ======> 0 35876 contig_33
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 34 ======> 0 1782 contig_34
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 35 ======> 0 2909 contig_35
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 36 ======> 0 3489 contig_36
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 37 ======> 0 108005 contig_37
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 38 ======> 0 23854 contig_38
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 39 ======> 0 2960 contig_39
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 40 ======> 0 1400 contig_40
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 41 ======> 0 1683 contig_41
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 42 ======> 0 1637 contig_42
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 43 ======> 0 1529 contig_43
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 44 ======> 0 2810 contig_44
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 45 ======> 0 1187 contig_45
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 46 ======> 0 103828 contig_46
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 47 ======> 0 58065 contig_47
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 48 ======> 0 41333 contig_48
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 49 ======> 0 12025 contig_49
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 50 ======> 0 3734 contig_50
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 51 ======> 0 214484 contig_51
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 52 ======> 0 101119 contig_52
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 53 ======> 0 2790 contig_53
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 54 ======> 0 12789 contig_54
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 55 ======> 0 3603 contig_55
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 56 ======> 0 7372 contig_56
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 57 ======> 0 26805 contig_57
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 58 ======> 0 40508 contig_58
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 59 ======> 0 68746 contig_59
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 60 ======> 0 43533 contig_60
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 61 ======> 0 7689 contig_61
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 62 ======> 0 30740 contig_62
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 63 ======> 0 76787 contig_63
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 64 ======> 0 2552 contig_64
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 65 ======> 0 159498 contig_65
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 66 ======> 0 94585 contig_66
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 67 ======> 0 26973 contig_67
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 68 ======> 0 98250 contig_68
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 69 ======> 0 20657 contig_69
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 70 ======> 0 9166 contig_70
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 71 ======> 0 56353 contig_71
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 72 ======> 0 18272 contig_72
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 73 ======> 0 76385 contig_73
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 74 ======> 0 2163 contig_74
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 75 ======> 0 19157 contig_75
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 76 ======> 0 64880 contig_76
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 77 ======> 0 7124 contig_77
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 78 ======> 0 17870 contig_78
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 79 ======> 0 25924 contig_79
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 80 ======> 0 3119 contig_80
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 81 ======> 0 3007 contig_81
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 82 ======> 0 108357 contig_82
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 83 ======> 0 137239 contig_83
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 84 ======> 0 18846 contig_84
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 85 ======> 0 83176 contig_85
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 86 ======> 0 8538 contig_86
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 87 ======> 0 102171 contig_87
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 88 ======> 0 102584 contig_88
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 89 ======> 0 68312 contig_89
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 90 ======> 0 4491 contig_90
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 91 ======> 0 76669 contig_91
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 92 ======> 0 1976 contig_92
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 93 ======> 0 14133 contig_93
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 94 ======> 0 42122 contig_94
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 95 ======> 0 1865 contig_95
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 96 ======> 0 71343 contig_96
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 97 ======> 0 3452 contig_97
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 98 ======> 0 3142 contig_98
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 99 ======> 0 9282 contig_99
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 100 ======> 0 13018 contig_100
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 101 ======> 0 123637 contig_101
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 102 ======> 0 19738 contig_102
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 103 ======> 0 19721 contig_103
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 104 ======> 0 15656 contig_104
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 105 ======> 0 93369 contig_105
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 106 ======> 0 66779 contig_106
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 107 ======> 0 62539 contig_107
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 108 ======> 0 23076 contig_108
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 109 ======> 0 44246 contig_109
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 110 ======> 0 9789 contig_110
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 111 ======> 0 19394 contig_111
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 112 ======> 0 37531 contig_112
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 113 ======> 0 88731 contig_113
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 114 ======> 0 3254 contig_114
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 115 ======> 0 95948 contig_115
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 116 ======> 0 113561 contig_116
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 117 ======> 0 123301 contig_117
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 118 ======> 0 42704 contig_118
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 119 ======> 0 172348 contig_119
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 120 ======> 0 3029 contig_120
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 121 ======> 0 29671 contig_121
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 122 ======> 0 39772 contig_122
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 123 ======> 0 50605 contig_123
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 124 ======> 0 34707 contig_124
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 125 ======> 0 75815 contig_125
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 126 ======> 0 6169 contig_126
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 127 ======> 0 28989 contig_127
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 128 ======> 0 51429 contig_128
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 129 ======> 0 1645 contig_129
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 130 ======> 0 817 contig_130
Size = 1 sequence
============================ END ===========================
========================== START =============================
contig 131 ======>-74178 -115 contig_131 gaps = 115
contig 132 ======> 0 128995 contig_132
Size = 2 sequence
One thing to note: the .bam file has 1587
contigs, while the resulting scaffold file that npScarf produced has just 131 unique sequences.
Maybe I'm just making an oversight. Is the resulting out.fin.fasta
file exclusively the regions which were joined, excluding all other sequences in the original reference fasta which were not merged?
Since your contigs don't have information about the coverage, npScarf just relied on the length of contigs to determine anchor (significant single-copy contigs) vs artefact/repeatitive contigs. So the short contigs were considered insignificant thus not reported here.
Also looingk at the log file, only one bridge was created, meaning that there is only one useful alignment fo npScarf from your BAM - is it normal in this case? I think it's not so much useful being able to only join 2 contigs together. Anw, can you plz try clone the newest git version of japsa and run again with --verbose option and send the log file to me.
Another tip is to play around with parameters of npScarf since your genome might need different settings; especially the --maxRepeat is important since your data is lack of coverage information.
Thanks - can we back up a step? Where does npScarf pull the coverage info; could I supply a file that does that into the current command as a parameter? There should be about 17x coverage (estimated by a recent canu
assembly) for the genome in question.
I'll try rerunning shortly and send you an update.
Cheers
npScarf will try to pull the coverage info from the contigs' name output by short-read assemblers. For example, if contigs come from SPAdes/Velvet, they will have format like
(ID)_length_(y)_cov_(z)
; or
(ID) (length) (k-mer sum)
from AbySS...
If I understand it right, you're trying to use npScarf to finish Canu assembly, no? You can do that eventhough the task is not designed for the tool to work with. npScarf's main purpose is to complete the assembly from a short-read assembler (input Illumina data) by using the long-read data (nanopore reads). The idea is that due to repetitive elements, short-read assemblers are usually suffered from fragmentations; and since long reads can span long regions, they'd help to bridge those gaps. Canu itself takes long read data as input, so the bridging information is already used in its pipeline. So I'm afraid reusing the long reads to scaffold Canu's contigs wouldn't give significant improvement. Correct me if I'm wrong. Cheers,
Hi there,
just weighing in here because we're also considering using npscarf to scaffold a long-read assembly from CANU. Here's one reason I think it might be useful:
CANU is not diploid aware, so in our assembly we get a partially phased genome. We can deal with the partial phasing by deleting one copy of each of the phased regions and then re-polishing the final contigs with short reads. But the result is still a highly fragmented assembly, for which the long reads should still contain good scaffolding information.
So, in this use case I think npscarf might be worth trying. My question, can you specify the exact syntax required for the contig names?
Given our fasta files of contigs, I can easily re-map long or short reads, and rename contigs to include coverage information in the way that npScarf expects it.
Also, I'm assuming that npScarf is justing relying on relative coverage of contigs (in the sense that collapsed repeats will have higher coverage). Is that right?
Cheers,
Rob
The contig name follow the syntax from SPAdes/Velvet or AbySS (dev) assembler; the default one based on SPAdes ouput when you'll have for example >NODE_1_length_839658_cov_89.484 npScarf will try to extract the coverage information from that name and convert to read coverage.
It also relies on the length of the sequence to help guessing if it's repetitive or not. You can change this from --maxRepeat parameter.
It's around 8Kb by default for our bacteria datasets of interest; but you need to increase this figure if you're working with more complex genomes. It depends on the genomes you're trying to assembly, or just set it very long if you are not sure.
Good luck with your pipeline, even though last time when checking the run log, there weren't many useful alignments had been used to scaffold further your draft Canu assembly. I think it might be worth trying more sensitive aligner for more alignments too (e.g. i normally run minimap2 with -k15 -w5)
Cheers,
Hi,
I'm curious if it's possible to run npScarf with ONT reads aligned to a reference genome (consisting of unique contigs). I've aligned the long reads with
minimap2
to generate an indexed and sorted .bam file.It was my thought that all that was needed to accomplish this was to supply the .bam file along with the original contigs.fasta file which the reads were aligned to. Perhaps there is more? This was my command:
This generated a series of files (
out.fin.fasta
,out.fin.japsa
, and the.log
file), yet the number of unique scaffods (and total nucleotides) was orders of magnitude less than what was present in the original contigs.fa file.In addition, there was also a series of warnings and errors in the log file:
I'm wondering what to make of the errors. Are each of these exceptions a product of me not providing a graph assembly? Are there additional arguments I should be entering with the
jsa.np.npscarf
command?