omgenomics / bio-data-zoo

Example genomics data for tool developers
MIT License
93 stars 2 forks source link

Bio Data Zoo

This repo contains example data in various genomics file formats. It is intended for bioinformatics tool developers to make testing software easier. It includes examples of valid file formats, edge cases, and invalid formats.

Browse

Browse the data on 42basepairs: https://42basepairs.com/browse/r2/bio-data-zoo

Download

Download this repo as a zip file: https://github.com/omgenomics/bio-data-zoo/archive/refs/heads/main.zip

Formats included

Format Extensions
FASTA .fa, .fa.gz
FASTQ .fastq, .fastq.gz
BAM .bam, .bam.bai, .bam.csi, .sam, .sam.gz, .sam.gz.csi, .sam.gz.tbi
VCF .vcf, .vcf.gz, .vcf.gz.csi, .vcf.gz.tbi, .bcf, .bcf.csi
BED .bed, .bed.gz, .bed.gz.csi, .bed.gz.tbi
CRAM TODO: .cram, .crai, different CRAM versions
GFF TODO: .gff3, .gtf, .gff, .gff.gz, .gff.gz.tbi

Data Source

Path Source Preview file Download file
basic_R1.fastq s3://1000genomes Preview on 42basepairs Download
basic.bam s3://1000genomes Preview on 42basepairs Download
basic_multisample.vcf s3://human-pangenomics Preview on 42basepairs Download
basic.vcf s3://human-pangenomics Preview on 42basepairs Download
basic.bed s3://human-pangenomics Preview on 42basepairs Download

Contributing

See CONTRIBUTING docs.