jorvis / biocode

Bioinformatics code libraries and scripts
MIT License
504 stars 247 forks source link

Model UTRs explicitly #14

Open jorvis opened 10 years ago

jorvis commented 10 years ago

Kyle - This is something you requested, but could you add a comment with a bit more information? Do you just need the class to be created or do you have a file already where they could be included? (I expect a GFF file where the mRNA/exon feature coordinates are outside of the range of the CDS ones.)

Keep in mind the GFF specification (scroll down to the section labeled "The Canonical Gene") http://www.sequenceontology.org/gff3.shtml

And the SO definition: http://www.sequenceontology.org/miso/current_release/term/SO:0000203

ktretina commented 10 years ago

Hi Joshua,

I'd like UTRs to be added to this file (which certainly has genes with UTRs):

/usr/local/projects/t_parva/annotation/Muguga-FINAL/t_parva.IGS.annotation.formatted.split.newIDs.withFunctional.20140519_1.gff3

I'd also like the class to be created because I have a bunch of motifs and I'm looking to see where they are found in the genome, including UTRs. I imagine using the class something like this, but please let me know what you think.

for gene in sorted(assemblies[contig].genes()): for mRNA in gene.mRNAs(): for five_prime_UTR in mRNA.five_prime_UTRs():

do something, like see if the UTR overlaps my motif

  for three_prime_UTR in mRNA.three_prime_UTRs():
     #do something else

Does that make sense? Let me know if you are looking for anything else. Thanks, Kyle

On Thu, Jun 5, 2014 at 5:17 PM, Joshua Orvis notifications@github.com wrote:

Kyle - This is something you requested, but could you add a comment with a bit more information? Do you just need the class to be created or do you have a file already where they could be included? (I expect a GFF file where the mRNA/exon feature coordinates are outside of the range of the CDS ones.)

Keep in mind the GFF specification (scroll down to the section labeled "The Canonical Gene") http://www.sequenceontology.org/gff3.shtml

And the SO definition: http://www.sequenceontology.org/miso/current_release/term/SO:0000203

— Reply to this email directly or view it on GitHub https://github.com/jorvis/biocode/issues/14.

jorvis commented 10 years ago

The basic classes are now added, but nothing to iterate over them yet.