samtools / htslib

C library for high-throughput sequencing data formats
Other
806 stars 446 forks source link

Long term tracking issue: consistency of API #297

Open jkbonfield opened 8 years ago

jkbonfield commented 8 years ago

In writing a quick bam_consensus tool I ended up with a structure like this:

typedef struct {
    // sam, bam AND hts! All the weasles in a single bag :)
    samFile *fp;
    bam_hdr_t *header;
    hts_itr_t *iter;
} pileup_cd;

It's pretty bizarre to have so many prefixes in there. Add to this the fact that the bam_hdr_t struct is read using sam_hdr_read and destroyed with bam_hdrdestroy, and the index for the iterator being read using samindexload and destroyed with htsidx_destroy, we have both a mismash of sam/bam/hts format and index/idx in function names.

Most of this is due to backwards compatibility, but it would be good to make all the orthogonal API functions exist too so the code is more readable and then to document which ones are supported and which are deprecated. It may also just be an issue of fixing the existing functions. Eg hts_idx_load does exist, but it's not actually the opposite of hts_idx_destroy (load doesn't load CRAM indices, but destroy does destroy them).

It's a long term goal though and not something we have a great deal of time to do right now.

adamspargo commented 8 years ago

Most of this is due to backwards compatibility, but it would be good to make all the orthogonal API functions exist too so the code is more readable and then to document which ones are supported and which are deprecated. It may also just be an issue of fixing the existing functions. Eg hts_idx_load does exist, but it's not actually the opposite of hts_idx_destroy (load doesn't load CRAM indices, but destroy does destroy them).

+1

Future development would be much faster once the basics are sane.


Dr Adam Spargo room: N312 Vertebrate Resequencing email: aws@sanger.ac.uk Wellcome Trust Sanger Institute Tel: +44 (0)1223 834244 Hinxton, Cambridge CB10 1SA ext: 8633

The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

dlaehnemann commented 8 years ago

I agree.

A suggestion for functions and structures dedicated to sam, bam and cram files, headers or records alike: aln_...