ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
174 stars 63 forks source link

Test data for heteroplasmy #140

Open stephanflemming opened 4 years ago

stephanflemming commented 4 years ago

Hi,

I am working on the integration of NOVOPlasty into galaxy. Where can I find the files used in the Heteroplasmy test runs in this repository, e.g. config_ERR1395547.txt

Forward reads         = Filtered_reads_ERR1395547_R1.fastq
Reverse reads         = Filtered_reads_ERR1395547_R2.fastq 

Thank you, Stephan

ndierckx commented 4 years ago

Hi,

Those files are not available on my github. The original files are on ENA or NCBI, I could write down how to reproduce them.. I will also upload a new version 4.0 with significant improvements

Greets,

Nicolas

ndierckx commented 4 years ago

Hi,

I uploaded 4.0, are you interested in more elaborated manual for the heteroplasmy analysis of that sample? I could add it to the wiki

stephanflemming commented 4 years ago

Great, I will have a look! Thank you.

stephanflemming commented 4 years ago

Hi, I have some questions and a couple of remarks :-)

In the wiki an output file Contigs_project.txt is mentioned. As far as I can see, Contigs_1_project.fasta is created. Does the "1" represent some kind of counter and files with higher number could be produced? Same for Circularized_assembly_project.fasta / Circularized_assembly_1_project.fasta and Uncircularized_assemblies_1_project.fasta (which is not mentioned in the wiki btw.)

I couldn't produce the result files Merged_contigs_project.txt, Option_nr_project.txt,, Possible_NUMTs_project.vcf, Possible_NUMTs_assemblies_project.fasta and Linkage_table_NUMTs_project.txt. Can you recommend a dataset for this? I just want to see if the wrapper works.

It seems that Platform: SE has an influence on the applied value of Insert size. How are these two parameters connected?

seed_input doesn't accept fasta.gz files, while chloroplast, forward, reverse, combined and reference allow that.

Setting a value for Heteroplasmy doesn't have an effect, MAF needs to be set instead. The description is a bit confusing here.

Just for clarification, when using Heteroplasmy mode are Assembly results (contigs, contigs tmp, ...) produced?

Hence a lot of result files are possibly created, which of them should be shown as defaults? Does it make sense in your opinion to hide some by default?

There are two typos in the README: "beeen", " size is know".

Assembled_reads_Result_R2.fasta and Assembled_reads_Result_R1.fasta are not mentioned in the wiki

Thank you! Stephan

ndierckx commented 4 years ago

Hi,

Sorry was on a holiday and didn't had the time after. Thanks for remarks, will try to fix some of the issues as soon as possible.

Greets,

Nicolas