samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
276 stars 244 forks source link

proposal for a package htsjdk.tribble.gtf #1662

Open lindenb opened 1 year ago

lindenb commented 1 year ago

Hi all,

here is a proposal for a new package htsjdk.tribble.gtf

Description

htsjdk contains a gff3 codec but is still missing a GTF codec. I wrote a few classes parsing the a GTF. Some parts were copied from the gff3 package.

# constants
src/main/java/htsjdk/tribble/gtf/GtfConstants.java

# an interface describing a GTF record
# it comes with default methods to get the gene_id, gene_name, transcript_id, etc...
src/main/java/htsjdk/tribble/gtf/GtfFeature.java

# an implementation of GtfFeature
src/main/java/htsjdk/tribble/gtf/GtfFeatureImpl.java

# the GTF parser
src/main/java/htsjdk/tribble/gtf/GtfCodec.java 

some tests have been added:

# test the properties of GtfFeature
src/test/java/htsjdk/tribble/gtf/GtfFeatureTest.java

# test the Codec
src/test/java/htsjdk/tribble/gtf/GtfCodecTest.java

and some files for the tests were added too:

./src/test/resources/htsjdk/tribble/gtf/gencode.v43.annotation.gtf
./src/test/resources/htsjdk/tribble/gtf/gencode.vM32.annotation.gtf.gz.tbi
./src/test/resources/htsjdk/tribble/gtf/gencode.vM32.annotation.gtf.gz

Comparaison with the gff3 package:

Things to think about before submitting: