Closed alephnull7 closed 1 week ago
When performing back-translation, back-translated sequences are only added to the aligned list if the process was successful (not None). Within the translation of individual sequences, the assertions have been replaced with None returns, and None is returned if the nucleotide evaluation was unsuccessful (None was returned). This allows all successful back-translations to be saved to file, instead of an issue with any individual sequence raising an exception and preventing the saving of a MultipleSeqAlignment object to file. Now, the No such file or directory error should only occur if no translations were successful.
Yes, good catch! That is an important improvement indeed.
Instead, I believe the intention was to add new entries to each key's list, which is what is now occurring. Edit: I reexamined this, and the dictionaries would not share any keys, so the previous implementation would work as expected. Instead, the new approach is just more streamlined.
Good catch too! While it would be highly unlikely that an CDS key would have the same name as an intron key (and, thus, override it) or vice versa, using dict.add()
is a safer approach in any event.
Thank you for moving the code regarding the feature extraction of genes, introns, and intergenic spacers into their own classes (i.e., GeneFeature
, IntronFeature
, IntergenicFeature
). The code is much cleaner because of it.
Changes directly addressing listed issues:
Error message: 'gene'
.aligned
list if the process was successful (notNone
). Within the translation of individual sequences, the assertions have been replaced withNone
returns, andNone
is returned if the nucleotide evaluation was unsuccessful (None
was returned). This allows all successful back-translations to be saved to file, instead of an issue with any individual sequence raising an exception and preventing the saving of aMultipleSeqAlignment
object to file. Now, theNo such file or directory
error should only occur if no translations were successful.rps12
, usually consisting of two or four total locations. In the former case, these two simple location genes are inserted into the proper location in the genes list, and in the latter, three genes are inserted into the list, as the annotation in IRb is duplicated. Later, when adding these items to the dictionary, only the first encounteredrps12
annotation in IRb and IRa will be added.rps12
feature with more than two simple locations. This seems to occur when a subset of the location list consists of duplicate annotations for the same genes, with minor differences in location. Another cause of this appears to be when exons 2 and 3 are annotated separately. A third case of this looks like annotations for the separate exons as well as an additional annotation for them considered contiguously. It should be noted that combinations of these occurrences have been observed. Right now, to account for these variations, anrps12
annotation is inserted into the desired location of the genes list if it begins after the previous gene and ends before the succeeding gene. If the annotation does not fit this criteria and overlaps with an adjacent gene, it replaces it if it is the same gene and is longer. This is in an effort to put the longest annotation corresponding to the gene as possible in a given location in the sequence. Other additions such as merging overlapping annotations of the same gene have been considered, but before going ahead with that I wanted to receive feedback about the current approach.rps12
, in regard to duplicate annotations and split sequences, and are handled in the same way.Other notable changes:
PlastidData
methods, respectively.BackTranslation
has been updated to set member variables that are used throughout the back-translation process and the methods have been updated to reflect this.update
method is destructive, specifically, if a key exists in both dictionaries, the value assigned to that key will be replaced by the value in the other dictionary. Instead, I believe the intention was to add new entries to each key's list, which is what is now occurring. Edit: I reexamined this, and the dictionaries would not share any keys, so the previous implementation would work as expected. Instead, the new approach is just more streamlined.