Open HobnobMancer opened 2 years ago
@widdowquinn I can't add labels to the issue
Is this a feature that's ok with you to be added? If so do you want me to added it via a new branch and PR, or add it to the HobnobMancer:issue_21_protein_ids branch and associated PR?
do you want me to added it via a new branch and PR
Yes, please!
do you want me to added it via a new branch and PR
Yes, please!
Cool, I'll get on that! Should be submitted as a PR hopefully by this weekend
Summary:
The complete nucleotide sequence retrieved from NCBI is written to the output, including any terminal stop codons. These sequences often cannot be used for backthreading onto aligned protein sequences, because the cds and protein sequences differ due to the presence of terminal stop codons in the nucleotide sequence that are not present in the protein sequence.
Description:
A
--drop_stop_codons
flag could be added, and when used all terminal stop codons in the cds sequence are removed, so that the retrieved cds matches the protein codon sequence for backthreading. Otherwise additional parsing of the output is required when using thencfp
output for backthreading nucleotide sequences onto aligned protein sequences.Current Output:
The only output is the complete nucleotide sequence.
Expected Output:
When using the flag
--drop_stop_codons
, terminal stop codons are removed from the end of each cds.ncfp
Version:v0.2.0