mrmckain / Fast-Plast

Automated de novo assembly of whole chloroplast genomes.
MIT License
43 stars 14 forks source link

Final Assembly not complete #36

Open NeMeh21 opened 5 years ago

NeMeh21 commented 5 years ago

Hi there! I have been getting the error while completion of final assembly on two separate data-sets. The error is as following "Checking coverage of final assembly. Final assembly is the last afin iteration. 79.0123456790123% of known angiosperm chloroplast genes were recovered in FP-L5_afin_iter2.fa. Could not properly orientate the plastome. Either your plastome does not have an IR or there was an issue with the assembly." How can I proceed further or what changes may I need to make for the completion of final assembly? Thanks.

mrmckain commented 5 years ago

Sorry for the delay on this. I missed the email notification. 1) Do you think you have enough data for a complete assembly? 2) What does the current assembly look like (length? total contigs?) 3) What lineage are you working in?

nsmt89 commented 4 years ago

Hi, sorry for jumping in. How can I check if I have enough data for assembly?

mrmckain commented 4 years ago

Depends on the sample but if you coverage is at least 20X, usually you can get plastome. Issues can arise if you are using by-catch from sequence capture or if you have too much data. There can also be an issues if the assembly breaks in the middle of a single copy region or if the assembly loops around the whole plastome more than 1x. If you suspect the former, we will have to chat about how to overcome it. If you have the former, see the troubleshooting section I am adding.

shangshanzhizhe commented 4 years ago

Hi, I encountered same problem. The mentioned NAME_afin_iter2.fa contains a single contig and its length is 115,119 bp. Here are some information may help.

  1. The reference chloroplast genome I used was from different genus but same family, the length of the reference was 159,347 bp;
  2. The sequencing depth of my sample was about 10x. The bowtie2 result in NAME_results_error.log was:

    16667700 reads; of these: 16492352 (98.95%) were paired; of these: 15994654 (96.98%) aligned concordantly 0 times 321885 (1.95%) aligned concordantly exactly 1 time 175813 (1.07%) aligned concordantly >1 times

    15994654 pairs aligned concordantly 0 times; of these:
      85036 (0.53%) aligned discordantly 1 time
    ----
    15909618 pairs aligned 0 times concordantly or discordantly; of these:
      31819236 mates make up the pairs; of these:
        31640149 (99.44%) aligned 0 times
        27500 (0.09%) aligned exactly 1 time
        151587 (0.48%) aligned >1 times

    175348 (1.05%) were unpaired; of these: 168512 (96.10%) aligned 0 times 3746 (2.14%) aligned exactly 1 time 3090 (1.76%) aligned >1 times 4.08% overall alignment rate

  3. Programs seemed worked perfectly except for BLASTN. Here are error outputs in the NAME_results_error.log:

Use of uninitialized value $final_start in hash element at /data/00/user/user103/software/05.genomic/Fast-Plast/bin/sequence_based_ir_id.pl line 120, <$file> line 2. Use of uninitialized value $final_end in hash element at /data/00/user/user103/software/05.genomic/Fast-Plast/bin/sequence_based_ir_id.pl line 120, <$file> line 2. Argument "" isn't numeric in sort at /data/00/user/user103/software/05.genomic/Fast-Plast/bin/sequence_based_ir_id.pl line 123, <$file> line 2. Argument "" isn't numeric in subtraction (-) at /data/00/user/user103/software/05.genomic/Fast-Plast/bin/sequence_based_ir_id.pl line 125, <$file> line 2.

I'm going to construct the phylogenetic tree with the chloroplast genomes. I'm wondering whether the completeness could affect the phylogenetic result or I can just use the afin result? Hope above information could help! Thank you a lot!

Best, Shangzhe

mrmckain commented 4 years ago

Hi Shangzhe,

Based on what you sent me, my guess is that the scripts to filter out the mitochondrial contamination overfiltered. I say this because your contig of 115,119 bp is about 15K bp too short for what I would expected based on your close relative. This can happen with high coverage. Check the spades contig file for a missing piece, probably the small single copy (~15-20kb) that has a similar coverage to a piece that I suspect is ~80kb. You can pull that contig into the filtered contigs file and rerun afin (see tutorial). Let me know if this doesn't help.

Best, Michael

shangshanzhizhe commented 4 years ago

Hi Michael,

Thanks for your reply. Sorry I didn't find out how to just rerun afin and following steps. Can you tell me more details? By the way I checked the discarded contigs. There was a contig NODE_3_length_18798_cov_18.7286. Should I add it to the filtered contig file?

Best, Shangzhe

mrmckain commented 4 years ago

Hi Shangzhe,

Check out this page for running afin: https://github.com/afinit/afin. You will want to use the filtered spades contigs file with the other contig you found. For reads, use all the trimmed reads that Fast-Plast produced.

After you do that, look at the troubleshooting.md file on the Fast-Plast page to see what to do next.

Let me know if you still have trouble.

Best, Michael