mrmckain / Fast-Plast

Automated de novo assembly of whole chloroplast genomes.
MIT License
43 stars 14 forks source link

bowtie index not present #50

Open kmmahan opened 3 years ago

kmmahan commented 3 years ago

I am using --bowtie_index all since my Order (Tribonematales) is not present. My Progress.log says "85.1851851851852% of known angiosperm chloroplast genes were recovered in xx.afin_iter0.fa" Does fastplast always report % angiosperm representatives? Is there a better way for me to use --bowtie_index? I ended up with 7 scaffolds. I blasted all the scaffolds and 2 of the longest were closely related to organisms that are in my organism's Class and the other 5 scaffolds were closely related to a peach tree.

Is there a way to take the two scaffolds that are related to my organism and improve the assembly to get a single circular contig?

kmmahan commented 3 years ago

I ended up using the closest relative of the indexes listed -Vaucheriales. It has been running for days: 11:36:16 Begin Iteration of afin: 3 11:36:16 OPTION VALUES 11:36:16 contig_sub_len: 100 11:36:16 extend_len: 112 11:36:16 max_search_loops: 50 11:36:16 max_sort_char: 4 11:36:16 min_cov: 1 11:36:16 min_overlap: 10 11:36:16 initial_trim: 100 11:36:16 max_missed: 5 11:36:16 mismatch_threshold: 0.100000 11:36:16 max_threads: 4 11:36:16 stop_ext: 0.100000 11:36:16 output file: T_minus_fastplast_chloroplast_Vaucheriales_afin 11:36:16 End initialization phase 11:36:16 Fuse contigs 11:38:08 Begin Extensions 11:40:31 Fuse contigs 11:40:31 Begin Extensions 11:42:35 Fuse contigs 11:42:35 Begin Extensions 11:44:54 Fuse contigs 11:44:54 Begin Extensions 11:47:17 Fuse contigs 11:47:17 Begin Extensions 18:36:00 Fuse contigs 18:36:01 Begin Extensions 26:12:21 Fuse contigs 26:12:21 Begin Extensions 33:40:48 Fuse contigs 33:40:57 Begin Extensions 41:08:35 Fuse contigs 41:08:35 Begin Extensions 48:39:48 Fuse contigs 48:39:48 Begin Extensions 56:02:43 Fuse contigs 56:02:43 Begin Extensions 63:24:58 Fuse contigs 63:24:58 Begin Extensions 70:56:01 Fuse contigs 70:56:02 Begin Extensions 78:21:37 Fuse contigs 78:21:37 Begin Extensions 85:42:59 Fuse contigs 85:42:59 Begin Extensions

Is this okay?

mrmckain commented 3 years ago

Fast-Plast was written using angiosperm data and that was my main intention. The gene set that is used as a checkmark on completeness is the angiosperm set. I do not have any others included in the package.

You can use your own index if you want to pull some existing. This is done with the --user_bowtie option.

Blasting your contigs will tell you what they are similar to, not necessarily what they are related to. Depending on length and which portion of the plastome you are looking at, they might come up as similar to something you wouldn't expect. There is always the possibility of contamination. I say all this as you should definitely verify your contigs but the method of Fast-Plast is using your reads for assembly, so those contigs are coming from the reads.

Not all data will result in a circular complete plastome.

mrmckain commented 3 years ago

RE: your long run. That is a long time. If you have a lot of data, you should reduce it. The builtin in subsampling option can do this.