Incorrect headers for Nuc -> Protein

soedinglab / plass

sensitive and precise assembly of short sequencing reads

https://plass.mmseqs.com

GNU General Public License v3.0

149 stars 14 forks source link

Incorrect headers for Nuc -> Protein #11

Open ghost opened 5 years ago

ghost commented 5 years ago

Hello,

Recently using this program I had tried

plass nuclassemble reads_1.fastq.gz assembly_testnu.fas tmp

Output ::

>1541_chr1_0_114757654_114757803_7891_JFMU01000067.1 AGCTGGAATTTCTAAAAAAGATATTAATGGCTTTATGATAAGAAAACTAAAGAATATTGAAATAA

However when trying to use

plass assemble reads_1.fastq.gz assembly_testpep.fas tmp

The headers are not there and I am seeing this string

>0 2+146 3 RLAFNSRKAMDNVTLTLELPPNAELTPFPGRQTISWTVDLKQGDNVLALPINVLFPGSGKLVAHLDDGTRRKTFSTAIPGNTEPSS*

Any ideas? Thank you

martin-steinegger commented 5 years ago

Ah interesting. The nuclassemble is returning currently the header of the sequences that got extended. While the protein assembler assemble returns the header information from the extracted orf. We will think about a solution how make the header information more consistent.

genomewalker commented 5 years ago

Hi Martin any news on the issue?

Thanks Antonio

martin-steinegger commented 5 years ago

@genomewalker we will have a discussion about this issue tomorrow. We will update this issues once we have a solution.

AnnSeidel commented 5 years ago

@genomewalker we agreed on a new header format of <uniq ID> len:<len> cycle:<0|1>. Each header will contain the uniqID and the len field, the cycle field is optional (for nucleotide sequences).

This is already implemented now (8a7d224). So both assembler workflows contain now more consistent information.

We are also planning to extend the header by an additional coverage field at some point, but this is not done yet.

<uniq ID> len:<len> cycle:<0|1> cov:<cov>

genomewalker commented 5 years ago

Thank you very much @AnnSeidel! The coverage field will be very useful when added!

Regarding the nucleotide assembly, is it still considered experimental?

AnnSeidel commented 4 years ago

We have done some significant further development regarding the nucleotide assembly in the last months, but I would still consider it as experimental...but we are working on it.

ewaltari commented 1 year ago

Sorry if this is the wrong place to ask, but I am very curious to know: Has the request to include the coverage information into the header been implemented? In my testing I haven't seen it, but would love to include it in subsequent steps!