mjy / obo_parser

An OBO file parser.
MIT License
6 stars 6 forks source link

Does not parse GO ontology 1.2 #1

Closed translunar closed 13 years ago

translunar commented 13 years ago

Used this gem successfully on the TAIR (arabidopsis.org) GO slim for Arabidopsis thaliana. Worked really well!

Didn't work so well when I try to use it on the full Gene Ontology:

> parse_obo_file(File.read("GO.obo"))
RuntimeError: cytochrome c
from /home/jwoods/obo_parser/lib/tokens.rb:69:in `initialize'
from /home/jwoods/obo_parser/lib/lexer.rb:52:in `new'
from /home/jwoods/obo_parser/lib/lexer.rb:52:in `match'
from /home/jwoods/obo_parser/lib/lexer.rb:38:in `block in read_next_token'
from /home/jwoods/obo_parser/lib/lexer.rb:37:in `each'
from /home/jwoods/obo_parser/lib/lexer.rb:37:in `read_next_token'
from /home/jwoods/obo_parser/lib/lexer.rb:10:in `peek'
from /home/jwoods/obo_parser/lib/parser.rb:32:in `parse_term'
from /home/jwoods/obo_parser/lib/parser.rb:16:in `parse_file'
from /home/jwoods/obo_parser/lib/obo_parser.rb:169:in `parse_obo_file'
from (irb):3
from /home/jwoods/.rvm/gems/ruby-1.9.2-p180/bundler/gems/rails-f064664de72a/railties/lib/rails/commands/console.rb:45:in `start'
from /home/jwoods/.rvm/gems/ruby-1.9.2-p180/bundler/gems/rails-f064664de72a/railties/lib/rails/commands/console.rb:8:in `start'
from /home/jwoods/.rvm/gems/ruby-1.9.2-p180/bundler/gems/rails-f064664de72a/railties/lib/rails/commands.rb:44:in `<top (required)>'
from script/rails:6:in `require'
from script/rails:6:in `<main>'

I do notice that this has a lot of escaped characters in it, which the code says it can't handle. Is there a workaround?

Many thanks for this! And BTW, I'm working on a hacked-together fork that has some graph code.

mjy commented 13 years ago

Thanks for the feedback! I noticed I had not pushed a token tweak. I added http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo as go.obo in the tests, and parse_obo_file() seems to complete the parse now- but I haven't actually written any tests against its content. For several ontologies I found just one or two fixes in the xrefs allowed for a full parse- I added tickets for those problems in some cases (for cell IIRC). Looking forward to seeing the graph code.