rachelss / SISRS

Site Identification from Short Read Sequences.
24 stars 15 forks source link

Pull Request: Finished editing pileup file scripts #38

Closed BobLiterman closed 6 years ago

BobLiterman commented 6 years ago

specific_genome.py and get_pruned_dict.py are both now working as expected. Single pileups are now streamed in both scripts rather than read in, reducing memory usage fairly drastically.

The test script has been expanded to include relevant examples, including how to handle deletions, minread/threshold issues, etc.

In specific_genome.py, deletions (*) are now replaced by Ns for downstream mapping purposes.

In get_pruned_dict.py, deletions are replaced by '-' as this can be used as a character in the subsequent alignment. Only A,T,C,G and - are passed through to the final alignment of sites. Any site called as N (Ns in assembly, Ns due to minread/threshold issues) are not passed through.

I will now move on to ensure that the get_alignment script is handling the data as expected.

BobLiterman commented 6 years ago

I left a rogue print line in a script, and removed it.

BobLiterman commented 6 years ago

Needed to import os to get_pruned_dict.py