jorvis / biocode

Bioinformatics code libraries and scripts
MIT License
504 stars 247 forks source link

Add fasta/subsample_fasta.py and sandbox/jcrabtree. #39

Closed jonathancrabtree closed 8 years ago

jonathancrabtree commented 8 years ago

The main new contribution here is fasta/subsample_fasta.py, a utility script for extracting a specified number of randomly-chosen sequences from a multi-FASTA file. It scans the input multi-FASTA file twice, once to determine the number of sequences in the file and then again to print the (randomly) selected sequences. The random number seed may be specified on the command-line in case reproducible behavior is desired.

The rest of the pull is my personal sandbox, which contains all the timing experiments, test code, and documentation for an alternate/revised version of sandbox/jorvis/group_rnaseq_transcripts_by_read_alignment.py