A note on the conflicts: It appears that @anderspitman also found my README issues, and his changes mirror my own. These were in a previous pull request that wasn't taken up, so feel free to accept whichever version you want.
The majority of this pull request deals with specific_genome.py, which previously was a large memory drain. With SE mapping, there will only be a single .pileups file in a directory, allowing us to stream the file rather than load the entire thing into memory.
Additionally, previous versions of this script (and some subsequent scripts which I will amend shortly) only account for single digit indels, where in real data there can be double digit indels. I have amended the code to allow for double digit indels.
'D' is used as a placeholder where the pileup file specifies a deletion, in accordance with downstream scripts. It seems more appropriate than 'N', which canonically signifies an insertion of some sort.
I have also added a test script (test_getCleanBases.py) which can be used with py.test to test the base parsing.
A note on the conflicts: It appears that @anderspitman also found my README issues, and his changes mirror my own. These were in a previous pull request that wasn't taken up, so feel free to accept whichever version you want.
The majority of this pull request deals with specific_genome.py, which previously was a large memory drain. With SE mapping, there will only be a single .pileups file in a directory, allowing us to stream the file rather than load the entire thing into memory.
Additionally, previous versions of this script (and some subsequent scripts which I will amend shortly) only account for single digit indels, where in real data there can be double digit indels. I have amended the code to allow for double digit indels.
'D' is used as a placeholder where the pileup file specifies a deletion, in accordance with downstream scripts. It seems more appropriate than 'N', which canonically signifies an insertion of some sort.
I have also added a test script (test_getCleanBases.py) which can be used with py.test to test the base parsing.