markschl / seq_io

FASTA and FASTQ parsing in Rust
MIT License
68 stars 11 forks source link

Whole Genome Alignments #7

Open noahaus opened 3 years ago

noahaus commented 3 years ago

Hello,

I'm using seq_io to do some sliding window analysis of a 4Mb genome, but the size of the sequence remains limited. How should I configure my Rust program for the seqeunce size to be about 4Mb long?

markschl commented 3 years ago

seq_io automatically grows the internal buffer when sequences are longer, this should work out of the box. Buffer growth can also be configured, e.g. an upper limit can be set (may make sense, in 0.4-alpha I introduced such a limit by default). Due to the design of this library, the sequence record always needs to fit into the buffer completely. Assuming that your input is FASTA, you could iterate over sequence lines using seq_lines(), which would be the fastest, but probably complicated with sliding window analysis when reaching line ends. The full (contiguous) sequence is e.g. accessed using full_seq(). In the latest alpha release, there is also full_seq_given(), which allows avoiding unnecessary and slow allocations. If you need to access several sequences from the alignment at once, you need to copy them somewhere anyway. The fastest in 0.4 would be clone_into_owned().

I hope that this answers your question...