The main new contribution here is fasta/subsample_fasta.py, a utility script for extracting a specified number of randomly-chosen sequences from a multi-FASTA file. It scans the input multi-FASTA file twice, once to determine the number of sequences in the file and then again to print the (randomly) selected sequences. The random number seed may be specified on the command-line in case reproducible behavior is desired.
The rest of the pull is my personal sandbox, which contains all the timing experiments, test code, and documentation for an alternate/revised version of sandbox/jorvis/group_rnaseq_transcripts_by_read_alignment.py
The main new contribution here is fasta/subsample_fasta.py, a utility script for extracting a specified number of randomly-chosen sequences from a multi-FASTA file. It scans the input multi-FASTA file twice, once to determine the number of sequences in the file and then again to print the (randomly) selected sequences. The random number seed may be specified on the command-line in case reproducible behavior is desired.
The rest of the pull is my personal sandbox, which contains all the timing experiments, test code, and documentation for an alternate/revised version of sandbox/jorvis/group_rnaseq_transcripts_by_read_alignment.py