louismullie / treat

Natural language processing framework for Ruby.
Other
1.36k stars 128 forks source link

Add support for SQL databases #42

Open louismullie opened 11 years ago

jbnunn commented 11 years ago

I can get to this one quicker than i can #18, if you want.

louismullie commented 11 years ago

I can take care of #18. There's @tomcartwrightuk who wants to work on this as well. As I discussed with him by e-mail, the best way to approach this would be to use Sequel as an abstraction layer, and then one of nested sets/path enumeration/closure tables to represent trees in the DB.

tomcartwrightuk commented 11 years ago

I do not have any experience writing tree structures to relational databases so I am not too qualified to choose a method. @jbnunn - do you have any opinions or suggestions? I found this useful run down on gems that are available that can perform the job of fomattting the data using the various methods, but it doesn't go into much detail. [1]

Presumably we need to make some decisions on the following:

Some of those might be irrelevant and I have probably missed things off, so chime in.

[1] http://hightechsorcery.com/2011/07/storing-hierarchical-tree-data-in-sql-using-ruby-on-rails/

louismullie commented 11 years ago

After looking at various options, my gut feeling is that closure tables are our best option. There's an actively maintained gem that looks to be of high quality at https://github.com/mceachen/closure_tree. However, I don't think we want to be dealing with ActiveRecord models. The best option may be to pull the necessary code from the gem.

louismullie commented 11 years ago

Any updates guys?

kshahkshah commented 11 years ago

I'm looking at this now and am taking a stab at it within my own repo. However, I think closure_tree is very tightly integrated with ActiveRecord and it would be difficult to crib the code and just run with it.

voronoipotato commented 11 years ago

It looks like it primarily uses it for "ActiveSupport::Concern" to handle its dependencies and then uses ActiveRecord once to pass that data on. I mean we should use some kind of ORM (right?) what are the downsides of ActiveRecord compared to DataMapper or some other ORM?

louismullie commented 11 years ago

I don't have a lot of experience with ActiveSupport, but the closure_tree gem looks very, very well coded and maintained. I think we should give it a shot!

kshahkshah commented 11 years ago

I'm still exploring this. The gem is indeed well maintained and w/very clean injection of its modules.

I've done the serialization end and am working on deserialization, then I'll abstract it and push it to my branch.

There are open questions on how to store the data though. The first thing is.. redundancy. I have an AR Model named Entity which uses STI with subclasses SentenceEntity, ParagraphEntity, DocumentEntity, etc. Each row also has a text column named 'content'. So of course the DocumentEntity has all the content, the ParagraphEntity just its paragraph of content, etc... should these be linked in some way? Should editing and saving a SentenceEntity bubble up for instance?