Open marjanUofT opened 2 months ago
We don't have this functionality implemented but it seems pretty straightforward.
On Fri, Sep 6, 2024, 7:47 PM Marjan Mohammadi @.***> wrote:
The task is to generate new sequences by introducing a random number of mutations at random positions within a specific range of positions, such as [29, 110).
I have the wild-type (WT) sequences, and I need to:
Apply a random number of mutations. Mutate random positions within the defined range. I am unsure if any of the available models can achieve this directly. If no suitable model exists, I can implement a method to generate mutations at random positions and then pass each mutated sequence to a model.
The goal is to generate at least 1,000 new sequences.
— Reply to this email directly, view it on GitHub https://github.com/microsoft/evodiff/issues/46 or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEMNWA4YYTFG73WGMOEDWLZVI5JNBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJLJONZXKZNENZQW2ZNLORUHEZLBMRPXI6LQMWBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTLDTOVRGUZLDORPXI6LQMWSUS43TOVS2M5DPOBUWG44SQKSHI6LQMWVHEZLQN5ZWS5DPOJ42K5TBNR2WLKJVGAYDSNRZGY3TTAVEOR4XAZNFNFZXG5LFUV3GC3DVMWVDENJRGEZTEMZTHE42O5DSNFTWOZLSUZRXEZLBORSQ . You are receiving this email because you are subscribed to this thread.
Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .
Thank you, @yangkky, for your quick response. Below is the code for generating mutation lists using the Poisson distribution and applying the "OA_DM_38M" model. This should help anyone looking to achieve similar results:
def generate_mutation_lists(num_lists=1000, mean_mutations=15, min_pos=21, max_pos=110):
all_lists = []
for _ in range(num_lists):
num_mutations = np.random.poisson(mean_mutations)
ends_list = np.random.randint(min_pos, max_pos + 1, num_mutations)
ends_list.sort() # Sort the end positions
start_list = ends_list - 1
start_list = np.clip(start_list, min_pos, max_pos)
all_lists.append((start_list.tolist(), ends_list.tolist()))
return all_lists
mutation_lists = generate_mutation_lists() # This should be the result of the function you previously ran
total_num_gen_seqs = 1000
generated_sequences = {}
for idx, (start_ids, end_ids) in enumerate(mutation_lists):
if idx >= total_num_gen_seqs:
break
start_ids = [start_ids]
end_ids = [end_ids]
masked_sequences = mask_sequences(sequences, start_ids, end_ids)
tokenizer = tokenizer
tokenized_sequences = tokenize_sequences(masked_sequences, tokenizer, device)
new_sequences = generate_unique_sequences(model, tokenized_sequences, start_ids, end_ids, sequences, tokenizer, num_gen_seqs = 1)
generated_sequences[idx] = new_sequences
The task is to generate new sequences by introducing a random number of mutations at random positions within a specific range of positions, such as [29, 110).
I have the wild-type (WT) sequences, and I need to:
Apply a random number of mutations. Mutate random positions within the defined range. I am unsure if any of the available models can achieve this directly. If no suitable model exists, I can implement a method to generate mutations at random positions and then pass each mutated sequence to a model.
The goal is to generate at least 1,000 new sequences.