sanger-pathogens / snp-sites

Finds SNP sites from a multi-FASTA alignment file
http://sanger-pathogens.github.io/snp-sites/
Other
233 stars 50 forks source link

Opens <file> twice? #26

Closed tseemann closed 8 years ago

tseemann commented 8 years ago

I think the code opens the alignment file twice, once to load the first alignment (the 'reference') then re-opens it again, skips over the first sequence, then reads the rest.

Is there a way this could only open it once to allow piping of stdin to stdout so it can be used as a pipe filter?

andrewjpage commented 8 years ago

Previously it opened the file a half dozen times. I've reduced it to 2 passes. This is to keep the memory usage to a minimum. First pass identifies the columns which contain snps and gets some metadata. Second pass pulls out the data for the columns with snps. I dont think I can accommodate stdin, sorry since I think it would involve storing all the sequences in memory.