Open manulera opened 4 months ago
Similar problem here
hom1 = 'ACCGGTTACCGGTT'
hom2 = 'CGCGTTACGCGTTA'
template = Dseqrecord(f'aaa{hom1}AATTGGAA{hom2}tttt')
insert = Dseqrecord(f'{hom1}AAA{hom2}')
asm = Assembly((template, insert), limit=14, use_all_fragments=True)
for a in asm.get_insertion_assemblies():
print(assembly2str(a))
The problem here is that homology regions on the left and right overlap (see the x
):
temp aaaACCGGTTACCGGTTAATTGGAACGCGTTACGCGTTAtttt
|||||||||||||||||x
insr ACCGGTTACCGGTTAAACGCGTTACGCGTTA
x|||||||||||||||
temp aaaACCGGTTACCGGTTAATTGGAACGCGTTACGCGTTAtttt
This confuses format_insertion_assembly
, which ends up returning wrong one as valid, and skipping right
>> ('1[3:19]:2[0:16]', '2[15:31]:1[23:39]')
Discarded but should be kept
>> ('1[23:39]:2[15:31]', '2[0:16]:1[3:19]')
Kept but should be discarded, turned into '2[0:16]:1[3:19]', '1[23:39]:2[15:31]')
Sometimes there is unexpected behaviour.
I think this can be caused by the extension of the overlap (instead of the minimal overlap, we extend to encompass as many as possible). Illustrated in the example below from the frontend (pseudocode). However, not sure if the
1[50:90]:2[0:40]', '2[43:83]:1[50:90]'
should ever be returned, if 2 is not circular.This returns 3 assemblies:
But it should return 6, because there could be a single