Open ucabuk opened 1 year ago
Hi,
I am not 100% sure I have understood your need so please correct me if I am wrong. It seems like you wish to split each single fasta record to multiple records, one for each exon. If so, then indeed, MetaEuk does not provide this kind of output but it should be possible to write a script that creates this fasta from the original fasta file*. Each exon is described in the fasta header, separated with pipes from the other exons. The numbers given for each exon are the original coordinates on the contig (please note the possible short overlap between exons. There is one between the first and second in your example). Also note that unlike the report in the MetaEuk header, the GFF coordinates start with index 1, as standard for that format. https://github.com/soedinglab/metaeuk#the-metaeuk-header
*I could assist with this, if needed
Hello Eli,
I want to split one big predicted protein to exons according to their gff file. I have three output
.fas .codon.fas .headersMap.tsv
and.gff
produced by Metaeuk.In gtf file, CDS coordination is based on assembled contig. So I could not find the information of coordination where exon stop in protein (.fas) output. Basically, what I want to do is that,
This protein contains more than one exon. I want to
to
I could not find this information in Metaeuk gff file, This is based on contigs, so I am able to separate it in .codon.fas file using these information, not in output .fas
Does Metaeuk provide any coordination information regarding splitting of exons in big coding sequence?
Thank you !