mrmckain / Fast-Plast

Automated de novo assembly of whole chloroplast genomes.
MIT License
44 stars 13 forks source link

No Final_Assembly result #48

Open Dementieva521 opened 3 years ago

Dementieva521 commented 3 years ago

A8_results_out.log A8_Fast-Plast_Progress.log

The Final_Assembly of folder can not generate. I don't know why. Can you give me some suggestions? Thanks.

0:00:00 Max Iterations of afin: 1 0:00:00 Begin add_reads() 0:00:00 readfile: ../1_Trimmed_Reads/A_14.trimmed_P1.fq 0:01:29 readfile: ../1_Trimmed_Reads/A_14.trimmed_P2.fq 0:03:04 readfile: ../1_Trimmed_Reads/A_14.trimmed_UP.fq 0:03:04 Begin sort_reads() 0:14:34 Begin sort_rc()

mrmckain commented 3 years ago

Can you please explain what you mean? According to the logs, a Final_Assembly folder was created. Your data did not result in a single contig after assembly, which is why it went to SSPACE to try to scaffold.

If your lineage is not supported by the packaged plastomes, you might want to use your own for mapping reads.

If you send your error file, it will be helpful. However, I noticed a string of T's in the logs that might be causing some problems.

-Michael

Dementieva521 commented 3 years ago

(1) I noticed that the final folder of most data will including five folders of 1_Trimmed_Reads, 2_Bowtie_Mapping, 3_Spades_Assembly, 4_Afin_Assembly and Final_Assembly, of which there will be final.scaffolds.fasta and Chloroplast_gene_composition_of_final_contigs.txt in the assembly folder. However, some data I asked yesterday did not generate the Final_Assembly folder, although I can find final.scaffolds.fasta in the 4_Afin_Assembly folder, I can’t get Chloroplast_gene_composition_of_final_contigs.txt, I want to know the reason. (2)I use Fast-Plast software to get the chloroplast protein coding gene using transcriptome data, can you give me some suggestions so that I can get more chloroplast protein coding genes? (3)I am sorry uploaded the wrong log file yesterday, I have reuploaded it, can you see if there is any problem? I used the command is, perl fast-plast.pl -1 A14-1.fq.gz -2 A14-2.fq.gz --name A14 --bowtie_index All --coverage_analysis --clean light A_14_Fast-Plast_Progress.log A_14_results_error.log A_14_results_out.log

mrmckain commented 3 years ago

Sorry for the delay.

1) The lack of a "Final_Assembly folder" is because that folder is created under certain circumstances. Ideally, it should be all the time, but I will have to go back into the code to add in its creation in a few more areas. Basically, the path of assembly for your data (eventually ending with scaffolding in SSPACE) happened to not have a Final_Assembly folder created and instead had things put into the 4_Afin_Assembly folder. The final.scaffolds.fasta file is your end file given your result.

2) Fast-Plast is really meant to be used on whole genome data with the intention of getting a full plastome. You can use what ever database you want for read mapping, but you have to add it yourself by following the instructions on the main Github page. Transcriptome data is going to have issues with coverage that breaks some parts of Fast-Plast. If going this route, try normalizing as suggested in the information pag.

3) The issue is the data type. Transcriptome data isn't going to get a full plastome and the backend parts of Fast-Plast are really expecting something close to a complete plastome.

Hope this helps. Michael

Dementieva521 commented 3 years ago

Hello, I tried your suggestions and used trinity or bbmap software to normalize the transcriptome data, but I don’t know if my normalization method is wrong, which makes it impossible to assemble chloroplasts in Fast-Plast. So I want to get to help from you. trinity normalizing command: insilico_read_normalization.pl --seqType fq --JM 20G --max_cov 200 --left A35-1.clean.fq.gz --right A35-2.clean.fq.gz --CPU 32 & Fast-Plast command: perl fast-plast.pl -1 A35-1.clean.fq.gz.normalized_K25_maxC200_minC0_maxCV10000.fq.gz -2 A35-2.clean.fq.gz.normalized_K25_maxC200_minC0_maxCV10000.fq.gz --name A35 --bowtie_index All --coverage_analysis --clean light & Best, Mint A35_Fast-Plast_Progress.log A35_results_error.log A35_results_out.log

mrmckain commented 3 years ago

I don't really recommend using transciptome data. You should temper your expectations of what you can get out of the assembly.

There is an error where the SPades assembly is supposed to be filtered. There seems to be an option error with v. 3.14. I will have to look at it.

Dementieva521 commented 3 years ago

I don't really recommend using transciptome data. You should temper your expectations of what you can get out of the assembly.

There is an error where the SPades assembly is supposed to be filtered. There seems to be an option error with v. 3.14. I will have to look at it.

Thank you very much.