mdcao / npScarf

26 stars 3 forks source link

Velvet contig: WARN japsa.tools.bio.np.NPScarfCmd - Not found any legal SPAdes output folder, #14

Open Yunxia-li opened 5 years ago

Yunxia-li commented 5 years ago

Hello: I had a contigfile Mtctgs.fasta like : I got Mtctgs.fasta from Velvet, >NODE_10906_length_1007_cov_71.668320 ATCGACTAGCTAGCTACGTCAGCATGCTAGCTCAGCTACGACTAGCATCAGCTCG

a bam file Mtctgs@pacbio.bam computed from: bwa mem Mtctgs@pacbio.bam pacbio.fasta

And I ran this: jsa.np.npscarf -seq Mtctgs.fasta -input Mtctgs@pacbio.bam -format bam -seq Mtctgs.fasta > Mtctgs-out

But I still got this warnning and didn't got any output:

[main] WARN japsa.tools.bio.np.NPScarfCmd - Not found any legal SPAdes output folder, assembly graph thus not included!

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 381 at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248) at java.base/java.util.Objects.checkIndex(Objects.java:372) at java.base/java.util.ArrayList.get(ArrayList.java:458) at japsa.bio.hts.scaffold.ScaffoldGraph.makeConnections2(ScaffoldGraph.java:347) at japsa.tools.bio.np.NPScarfCmd.main(NPScarfCmd.java:284)

Then I checked the code, I find this: Does that mean I have to have the spades.out folder contain graphFile && pathFile ?

`if(spadesFolder !=null && graphFile.exists() && pathFile.exists())

        LOG.info("===> Use assembly graph and path from SPAdes!");
    else{
        LOG.warn("Not found any legal SPAdes output folder, assembly graph thus not included!");
        spadesFolder=null;`

Hope for your reply Yunxiali

hsnguyen commented 5 years ago

No, it is just a WARNING since the SPAdes folder is optional. If you don't have it, then a scaffolding algorithm without the assembly graph is invoked.

It looks like to me that the problem is from the BAM file when npscarf couldn't find the reference (contig) index from there. Please check the header of the BAM file again and let me know if they match the contigs' name.

Yunxia-li commented 5 years ago

Thanks for your reply. I use the contig and the pacbio_reads to get the bam file.It is like that: head -10 :

@SQ SN:NODE_10906_length_1007_cov_71.668320 LN:1077 @SQ SN:NODE_6355_length_946_cov_48.813953 LN:1016 @SQ SN:NODE_30588_length_1250_cov_35.036800 LN:1320 @SQ SN:NODE_8407_length_2603_cov_34.215904 LN:2673 @SQ SN:NODE_6399_length_630_cov_97.301590 LN:700 @SQ SN:NODE_2006_length_997_cov_32.983952 LN:1067 @SQ SN:NODE_909_length_6013_cov_39.730751 LN:6083 @SQ SN:NODE_6279_length_1357_cov_44.332352 LN:1427 @SQ SN:NODE_626_length_3497_cov_33.510437 LN:3567 @SQ SN:NODE_2525_length_1366_cov_54.859444 LN:1436 tail -3 : here I use {sequence} to represent the real seqs

a730c678-cc44-42df-aebf-88f476351019 0 NODE_8511_length_231609_cov_68.992912:::fragment_3 6058 60 57S28M1D26M6I39M1D7M2I30M2D50M1D18M2D33M1D11M1I9M2D30M1I139M4I17M1I12M2D22M1I100M2D49M2D20M1I15M1I12M2I9M1I29M2D12M1D50M3D7M3I30M1D6M1D4M2I26M1I18M1I13M2I50M1D29M2D48M1D10M1I37M2D39M1D55M2D33M2I32M1I35M3I37M1D12M1D20M1D26M1I7M1I6M1D42M3I93M1I68M2D11M2I40M1D8M3D11M1I64M1I83M1I23M1D15M1I50M1I66M1I171M1I77M1I15M2I44M2D37M2I51M3D13M1I31M1D6M1D59M2D25M1I11M4I22M1D18M1D7M1I3M2D28M1I4M1D52M4D17M1I29M1D39M1I12M2D42M1I34M1D27M1D57M2I39M1D2M2I23M1D10M1I29M1D16M2I81M2I50M1I56M1D19M1D12M1I10M1D32M1I5M2I4M5D92M1D2M1D152M1I27M1D56M1D144M4D9M1D57M2D26M1D17M1D3M2I24M4D11M2D8M2D24M1D75M2I19M1D11M1I27M1I40M1I45M1D69M1D20M1D3M3D4M1I16M1D12M2D4M2D13M1D41M2I15M1D67M2I22M1D14M3D10M1I18M2I17M1I59M1I10M3D13M1D6M1D8M1I10M2I20M1D5M2I2M2D5M2I66M1I39M1I8M1I16M1D94M2D17M1D43M1I13M1D41M3D59M1D54M3I31M1D10M1I15M1D7M2I37M5I4M5D24M1D48M2D47M1D12M2I11M1D16M5D16M2D60M1D12M1D29M2I43M1I9M1I34M1I13M1I5M3I3M4D32M1D19M2D25M1I4M7D4M4I5M1D2M1D7M1D33M2D62M1D49M1D7M1I41M3D2M3I32M1D31M1I15M1I8M1D8M1I17M1D2M2D17M1D35M2D15M1I16M1D50M1D30M2D18M3D16M1D4M2I7M1I31M1D12M3D34M1D9M1D16M2I26M1D25M2D38M2I65M1D38M1D18M1D32M1I36M1I18M1D3M3D33M2D52M1D13M1D76M1I16M2D8M2I12M1I13M1I40M1I5M4D44M2I27M1D33M1I33M3D9M3I6M1I49M1I1M3D38M1D9M1D8M1D19M1D28M2I10M1D14M1D8M1D5M1D90M1I5M1I12M2D16M1I21M4D19M1I16M1D12M1D7M3D41M1D24M1D17M1I1M2D38M1D8M7D2M1D50M2D37M1D63M4I55M1D16M2D23M2D94M1I13M2I24M1D10M5D23M1I45M1D24M1D33M3I6M2D31M1I5M1I28M1D38M1D31M1I2M1I16M2D38M1I68M1D10M3D28M1D2M1D22M4D21M2D68M9D3M3I96M1I7M1D106M2D23M2I27M1I31M4I11M2D78M1I24M1D55M1D2M2I9M5I5M4D20M1I59M3D26M1D31M2I69M1D45M2I8M1D30M2D40M1I6M4D14M2I32M3D15M1D5M1D5M1I25M1D19M1D2M2I33M1D86M2I19M1D16M1D20M1D8M1D22M1I17M1D9M1D17M1I7M1I11M1D3M2I11M1D3M1D20M1D33M1I26M4D64M1I131M2I5M1D34M3D2M1I4M3D51M1D4M1D43M2I66M2D37M1I24M1D29M1D8M1D18M1D30M1D25M37S 0 0 {sequence} NM:i:873 MD:Z:28^C23G2A31G1G2A1^G37^CC2G40G6^C3G14^TC33^A3T2A1G11^CT0A11T0A2T0G34G4A140^CT49G28C21G0C5G8G5^TG0C31G0A15^GA85^TT0G11^A50^ACT1G7A27^C6^A111^C29^GA41C0A1A1A1^G47^CC39^C55^TT100T36^C1T10^G20^C39^T180C0G21^CC51^C6A1^ACC0G2T106A1A68^A425A0G11^CT35G41A10^TTT0G1A10T30^C6^T5A0G52^AG36G1G3T0G1A0T11^T0C7A1A1C5^A10^TT32^G0A1T8G38A1^TGAG46^A11C0T0G37^TC76^T27^G52A0G40G1^A25^T36A2^G203^A0A9T8^C22^C25G15^CCTTT1T22G0A15G0A39A7A1^G2^A21A157^A56^T29G1A33G0A47A0G28^TCTT9^A39G0A3C12^TT26^A6A10^T27^GGGG2G8^TT8^GC0A16A6^G65G28^T123^T54A14^C1G18^A3^TGC0C3G0G2G11^G12^CA4^CC1C0A10^G29G0A0T11G12^T67G21^G9C0T0T2^TTC104A9^GAC13^A6^A6A8T10G0C0G0T6A1^G0C6^TT0G4G58G34G1G0A12T0G17^T75G18^TT17^G56^G8G1A0T0A28^GCT0C2A23G0A30^C3A0T80^A25^A11A33A2^TTATG24^G9A38^AG39A7^A12C10^C16^CCCCG2C13^AA28G31^A12^C29G11G31G0G10A50^TTCC32^A3G0G14^AA29^ACCCGTT2G6^A2^T7^C33^CA62^A40G0T7^G48^GCC34^A54^C25^A2^AG17^C35^GA17G13^G50^T14A0C14^AG18^GGG16^G42^A0A0C0C3C0C0C3^CAA34^T9^A42^C25^GC14G86A1^G36G1^A18^G66A19^A3^AAG33^AA26G25^C13^C92^GC78^AGAG71^C36C0T28^AGA13T17A1A31^GAT38^A9^T8^C19^A15T0G21^G2G1A9^A8^C5^G11A95^GG16G20^AAGG1A33^A1C1A8^C7^CAG2G5G1A30^A24^A18^TT1A36^A8^AGCTAGC2^A13G36^AG2C27A0G5^A59A0G3G20G1G23A6^T16^AA0C0T21^AA2G54G27G1C43^C10^AAAAG68^C24^C1C37^AC25G33T4^C38^A49^AT8T2G77G0G1A13^A1T8^GAG0C27^A2^G22^TTTC1T19^TT68^CGAGAACCT3G56G0A0G26C1T0C6G6^C59A0G27G17^TG34C57^GG16A0G48G24G1A8^C55^G9C1G4^CTCG28G0A49^TCT2C23^C27G42A0C28^T53^C1T28^GG46^GTCC8T37^GGA15^A5^C7C2G19^C12G0T5^A0A27G6^C105^C16^A20^T8^C22A16^T0T0T7^G0T34^G0T13^A3^C1G18^C59^AAAT0G63G4G56G1G71^A2C31^ATC6^AAG51^A4^G28A6A62A0A0T0G7^TC37A0T22^A29^C8^A18^A30^C25 AS:i:6109 XS:i:0 ea74c4cc-6e8b-46fc-99bb-6b7e9b7c8ea6 0 NODE_656_length_23526_cov_83.054581 15970 60 42S69M2I22M1D3M2D14M1I28M3D24M1D10M2I20M2D8M1I20M1D32M1I24M1I12M1I31M1D33M1D16M3D12M3I8M2I6M1I24M2D10M2D8M1I22M4D30M1D3M1D6M1D13M2D35M3D6M1D12M2D78M2D47M4D10M1D20M1D3M2D4M3I22M2D33M1I8M1D47M1I5M1D20M1D8M1D29M1I7M1D7M4D9M1D16M1I8M3I5M2I34M2D30M1D3M2D70M1D35M1D52M1D6M3I4M1D11M1D18M1D16M1I14M1D8M5D46M1D11M1I2M2I1M3D25M1D4M3D5M1I10M1I17M1I9M1I10M1I14M1D34M1I7M1D69M1I25M2I2M8D14M1D25M1I9M2D23M1I12M1D39M1I27M1I15M1I36M1D17M1I78M1D39M1D32M1I11M1I13M1I26M1D11M1I77M1I23M1I17M3I66M1D3M1I47M1I45M1D16M1D14M2D7M5I56M1I6M2D24M3D56M2D41M2I63M1D25M1D16M1I50M3I7M3I3M1D30M1D24M1D22M4D38M3D6M1I4M1I28M2I18M1D20M3I57M1D6M2I98M4I21M1I39M1I16M1I10M1I76M1I51M1D32M3I27M1D33M3D17M4D29M1I21M1D13M2I3M3D4M1D27M1I24M1I42M1I65M3D5M2D42M5D62M2D74M1D56M1I25M2I19M3D10M1D18M1D21M2D33M1D24M3I11M1D28M1D60M2D14M2D17M3I6M2I1M1D6M1D3M3I40M2D6M2D8M2D19M1I26M4I14M2D35M1D21M2D10M1I1M2D79M4D12M6I4M2I22M3D5M2I29M1D10M1I8M1D20M3D8M1D14M3D14M1D13M2D4M6I17M2D36M2I46M3D5M1I27M3D47M1D15M1D9M2D12M1D61M1D9M2I11M1D11M1D19M1I9M1D66M2D23M2I8M1I61M1I25M1I38M4I3M2D18M2D51M1D40M1D7M1D19M1I17M4D59M1D20M1D30M1I42M2I5M2I55M1D14M1I73M1D15M1D85M2D11M1I69M2D6M4D19M1I21M3I19M3I42M1I19M1D23M1I44M1D6M1I13M1D64M1D15M3D22M1D73M1D13M2I37M3I14M2D85M1D4M3D40M1D49M1I189M1I73M1I9M1I35M1D13M2I37M3I39M2I32M1I3M2D27M2D133M1I132M1D24M2I3M1I46M1I22M2D149M3I34M3D8M1D15M2D9M1D10M9435S 0 0 {sequence} NM:i:678 MD:Z:88G2^A3^GA42^GGG24^A0A0T1G26^CG10G7G9^C99^C33^G0C4G0T3A3T1^TTT0G1C0T4G18G22^AC0A9^AA10A0T18^TCTT30^C3^C6^C13^CA23A11^CCT6^T12^AA8G17G51^TC13A33^GGAA3G6^A20^G3^AC26^AC0A0A39^C49T2^T20^A8^A36^A2G0C3^AAAG9^A0A0C17A2A40^AG25G0C3^G3^GA1C32C0T34^A35^G52^A10^C11^A18^A22G0A6^A8^AAAAG12A1T31^A0A13^CAA25^C2G1^CCC65^C41^C13G3A21C3T0C46T2A1^CAATCCCC14^C34^TT35^A18G6G13T0G22T3G49^A93G1^A37G1^A82^T37A0G90A64^T3T41A46G2^A16^T9A4^AA9A0T1T0G0A54^TT24^CGA0A34G20^TG104^C25^C13T0A1A47T1A9^A28A1^G13A10^A0A21^TTCT5A0G0C11G0G1A0A14^AGA0A55^A5A0G11A2G1A53^T164T45A72G31G1^A59^C33^GGA17^ACTA0A49^A0A0T12A1^CTG4^A27G23A8G0A41A54^TCT3C1^GC42^GAAGG19G0A41^TT74^T23A4T27G0T2C39^CCC10^C18^G0C20^AG33^G35^A14A0G0C9A1^G0G59^GA14^TT10G2A10^C6^A43^AC6^AG8^AA37G8G12^TT35^C21^CA11^CA18A0G38T0A16T0G1^TGGA0G4T1T2A1G25^GGG34^T1A16^A16G3^TTT8^A0A13^GAA1C12^C13^CT4G14A1^AT0C29A0G30A0G0C17^CTA7G0C23^TTT47^A0T14^A9^TC12^C61^A20^T11^C20G7^T13G52^AG88A0T0G67^CA18^TC51^T30T9^A7^T36^TAAT2C56^T20^T31G100^A16T48A0A1T0G17^A15^A15G0A68^GG3C55A20^TT6^AAAG0A35A64C14G3^A50A9A3A2^C7G11^T64^C15^AAA22^T73^A33A18G0T10^AG85^A4^AAA22G0A4G1A9^A250G0C0A0A101^T6T43C9A0G62^TT25A1^CG121G0A26A1T1C9G16A84^T25G69^TC3C29G0C0A51A95^CTG0A7^C15^GA9^G10 AS:i:4186 XS:i:0 SA:Z:NODE_656_length_23526_cov_83.054581,12,+,7550S15M2D9M1D10M1I6M1I24M2D12M1I18M1I18M3I57M1I30M2I17M1D9M2D29M3I3M1D34M1D10M1I54M1D6M3I14M2I47M6D28M1D1M3I15M1D72M1D44M1D18M1I22M2I16M1D3M1I13M4D1M8I72M1D6M2D22M1I27M2D5M1D60M1I11M2I29M1I22M1D15M2I57M1D49M2I19M1I23M2I2M1D39M2D99M3I20M1D7M1D7M1D48M1I36M1I15M1D26M4I3M1D14M1D66M1I24M2D6M2I13M1I13M1D16M2I3M1I13M1I54M1D13M1D11M6D17M6I47M1D43M5D1M8I5M2D25M1I20M1D40M1D30M1D65M1D11M1D12M3I14M1D20M2I28M1D9M1I17M4D103M7258S,60,206; ea74c4cc-6e8b-46fc-99bb-6b7e9b7c8ea6 2048 NODE_656_length_23526_cov_83.054581 12 60 7550H15M2D9M1D10M1I6M1I24M2D12M1I18M1I18M3I57M1I30M2I17M1D9M2D29M3I3M1D34M1D10M1I54M1D6M3I14M2I47M6D28M1D1M3I15M1D72M1D44M1D18M1I22M2I16M1D3M1I13M4D1M8I72M1D6M2D22M1I27M2D5M1D60M1I11M2I29M1I22M1D15M2I57M1D49M2I19M1I23M2I2M1D39M2D99M3I20M1D7M1D7M1D48M1I36M1I15M1D26M4I3M1D14M1D66M1I24M2D6M2I13M1I13M1D16M2I3M1I13M1I54M1D13M1D11M6D17M6I47M1D43M5D1M8I5M2D25M1I20M1D40M1D30M1D65M1D11M1D12M3I14M1D20M2I28M1D9M1I17M4D103M7258H 0 0 {sequence} NM:i:206 MD:Z:15^GA9^G16G23^GC1A15G0A0C0C7A40G0A81^A9^AA32^A34^A64^A67^AAAAAG20A7^T16^C72^C44^G16G29G1G1A5^A14G1^CTTC73^T6^AC49^GA5^T118G0A2^T16T21A0G1A30^T93^A39^TT119^G7^A7^A22A61T0T13^A23A2G2^C14^C90^CC8G17A1G0C2^T0T33C51^A11G1^A11^TTGCTC1T1T35A3A20^T43^CTTTG6^GA4A27G8A1C1^A40^T30^C55G5A3^C11^A9A16^G0C17G29^T26^ACCA0G0C0C100 AS:i:1215 XS:i:0 SA:Z:NODE_656_length_23526_cov_83.054581,15970,+,42S69M2I22M1D3M2D14M1I28M3D24M1D10M2I20M2D8M1I20M1D32M1I24M1I12M1I31M1D33M1D16M3D12M3I8M2I6M1I24M2D10M2D8M1I22M4D30M1D3M1D6M1D13M2D35M3D6M1D12M2D78M2D47M4D10M1D20M1D3M2D4M3I22M2D33M1I8M1D47M1I5M1D20M1D8M1D29M1I7M1D7M4D9M1D16M1I8M3I5M2I34M2D30M1D3M2D70M1D35M1D52M1D6M3I4M1D11M1D18M1D16M1I14M1D8M5D46M1D11M1I2M2I1M3D25M1D4M3D5M1I10M1I17M1I9M1I10M1I14M1D34M1I7M1D69M1I25M2I2M8D14M1D25M1I9M2D23M1I12M1D39M1I27M1I15M1I36M1D17M1I78M1D39M1D32M1I11M1I13M1I26M1D11M1I77M1I23M1I17M3I66M1D3M1I47M1I45M1D16M1D14M2D7M5I56M1I6M2D24M3D56M2D41M2I63M1D25M1D16M1I50M3I7M3I3M1D30M1D24M1D22M4D38M3D6M1I4M1I28M2I18M1D20M3I57M1D6M2I98M4I21M1I39M1I16M1I10M1I76M1I51M1D32M3I27M1D33M3D17M4D29M1I21M1D13M2I3M3D4M1D27M1I24M1I42M1I65M3D5M2D42M5D62M2D74M1D56M1I25M2I19M3D10M1D18M1D21M2D33M1D24M3I11M1D28M1D60M2D14M2D17M3I6M2I1M1D6M1D3M3I40M2D6M2D8M2D19M1I26M4I14M2D35M1D21M2D10M1I1M2D79M4D12M6I4M2I22M3D5M2I29M1D10M1I8M1D20M3D8M1D14M3D14M1D13M2D4M6I17M2D36M2I46M3D5M1I27M3D47M1D15M1D9M2D12M1D61M1D9M2I11M1D11M1D19M1I9M1D66M2D23M2I8M1I61M1I25M1I38M4I3M2D18M2D51M1D40M1D7M1D19M1I17M4D59M1D20M1D30M1I42M2I5M2I55M1D14M1I73M1D15M1D85M2D11M1I69M2D6M4D19M1I21M3I19M3I42M1I19M1D23M1I44M1D6M1I13M1D64M1D15M3D22M1D73M1D13M2I37M3I14M2D85M1D4M3D40M1D49M1I189M1I73M1I9M1I35M1D13M2I37M3I39M2I32M1I3M2D27M2D133M1I132M1D24M2I3M1I46M1I22M2D149M3I34M3D8M1D15M2D9M1D10M9435S,60,678;

I am sure the the header of the BAM file corresponding to the contig file name. It made me so confused.

hsnguyen commented 5 years ago

The Exception is caused by a SAMRecord from your BAM file. According to the htsjdk SAMRecord's API, the reference name is not found in the sequence index so NO_ALIGNMENT_REFERENCE_INDEX (-1) is returned. Can you carefully check the BAM file if all the reference names (3rd column) exactly match the contig names in Mtctgs.fasta? I can see in the first line of your tail -3 command, the contig name is "NODE_8511_length_231609_cov_68.992912:::fragment_3" if not typo then the part ":::fragment_3" looks wrong to me.

Btw, have you tried SPAdes instead of Velvet?

Yunxia-li commented 5 years ago

Well, I checked my contig.fasta, and I found some duplicate id. After remove these duplicate id, and rerun these commands, and it was successful. Thanks for your kindly reply.