Open ghost opened 5 years ago
Ah interesting. The nuclassemble
is returning currently the header of the sequences that got extended. While the protein assembler assemble
returns the header information from the extracted orf. We will think about a solution how make the header information more consistent.
Hi Martin any news on the issue?
Thanks Antonio
@genomewalker we will have a discussion about this issue tomorrow. We will update this issues once we have a solution.
@genomewalker we agreed on a new header format of <uniq ID> len:<len> cycle:<0|1>
. Each header will contain the uniqID and the len field, the cycle field is optional (for nucleotide sequences).
This is already implemented now (8a7d224). So both assembler workflows contain now more consistent information.
We are also planning to extend the header by an additional coverage field at some point, but this is not done yet.
<uniq ID> len:<len> cycle:<0|1> cov:<cov>
Thank you very much @AnnSeidel! The coverage field will be very useful when added!
Regarding the nucleotide assembly, is it still considered experimental?
We have done some significant further development regarding the nucleotide assembly in the last months, but I would still consider it as experimental...but we are working on it.
Sorry if this is the wrong place to ask, but I am very curious to know: Has the request to include the coverage information into the header been implemented? In my testing I haven't seen it, but would love to include it in subsequent steps!
Hello,
Recently using this program I had tried
plass nuclassemble reads_1.fastq.gz assembly_testnu.fas tmp
Output ::
>1541_chr1_0_114757654_114757803_7891_JFMU01000067.1 AGCTGGAATTTCTAAAAAAGATATTAATGGCTTTATGATAAGAAAACTAAAGAATATTGAAATAA
However when trying to use
plass assemble reads_1.fastq.gz assembly_testpep.fas tmp
The headers are not there and I am seeing this string
>0 2+146 3 RLAFNSRKAMDNVTLTLELPPNAELTPFPGRQTISWTVDLKQGDNVLALPINVLFPGSGKLVAHLDDGTRRKTFSTAIPGNTEPSS*
Any ideas? Thank you