specific_genome.py and get_pruned_dict.py are both now working as expected. Single pileups are now streamed in both scripts rather than read in, reducing memory usage fairly drastically.
The test script has been expanded to include relevant examples, including how to handle deletions, minread/threshold issues, etc.
In specific_genome.py, deletions (*) are now replaced by Ns for downstream mapping purposes.
In get_pruned_dict.py, deletions are replaced by '-' as this can be used as a character in the subsequent alignment. Only A,T,C,G and - are passed through to the final alignment of sites. Any site called as N (Ns in assembly, Ns due to minread/threshold issues) are not passed through.
I will now move on to ensure that the get_alignment script is handling the data as expected.
specific_genome.py and get_pruned_dict.py are both now working as expected. Single pileups are now streamed in both scripts rather than read in, reducing memory usage fairly drastically.
The test script has been expanded to include relevant examples, including how to handle deletions, minread/threshold issues, etc.
In specific_genome.py, deletions (*) are now replaced by Ns for downstream mapping purposes.
In get_pruned_dict.py, deletions are replaced by '-' as this can be used as a character in the subsequent alignment. Only A,T,C,G and - are passed through to the final alignment of sites. Any site called as N (Ns in assembly, Ns due to minread/threshold issues) are not passed through.
I will now move on to ensure that the get_alignment script is handling the data as expected.