Closed pezhmansafdari closed 10 years ago
At first I wasn't sure what to do here, so I turned to the definition of intergenic_region from the Sequence Ontology:
http://www.sequenceontology.org/browser/current_svn/term/SO:0000605 "A region containing or overlapping no genes that is bounded on either side by a gene, or bounded by a gene and the end of the chromosome."
Based on this, yes, the ends of the contigs to the proximal gene should be included as intergenic space. I can make this change, but it would still mean that short contigs with no annotated genes would NOT be added. Do you agree?
Hi,
Yes I agree. If one want to have short contigs with no gene, they should calculate it separately.
Thanks, Pezhman
On Sat, Jun 21, 2014 at 9:24 PM, Joshua Orvis notifications@github.com wrote:
At first I wasn't sure what to do here, so I turned to the definition of intergenic_region from the Sequence Ontology:
http://www.sequenceontology.org/browser/current_svn/term/SO:0000605 "A region containing or overlapping no genes that is bounded on either side by a gene, or bounded by a gene and the end of the chromosome."
Based on this, yes, the ends of the contigs to the proximal gene should be included as intergenic space. I can make this change, but it would still mean that short contigs with no annotated genes would NOT be added. Do you agree?
Reply to this email directly or view it on GitHub https://github.com/jorvis/biocode/issues/18#issuecomment-46760967.
My apologies, I forgot to update here. About a week ago I committed a version that fixes this, but it currently requires your FASTA data to be embedded at the end of your GFF3 file (per the specification.) Marking this as closed since it fixes the issue, but I'm also adding a --fasta option which will allow it to be in a separate file.
New version with --fasta now committed.
Hi, When I calculate the intergenic space of a contig with report_gff_intron_and_intergenic_stats.py and add the total length of the genes on that contig to it, the result is shorter than the total length of the contig. My assumption is, this code does not consider the length from beginning of the contig to the beginning of the first gene and also end of the last gene to the end of the contig.
Cheers, Pezhman