louismullie / treat

Natural language processing framework for Ruby.
Other
1.37k stars 128 forks source link

Why the strange syntax? #17

Closed bluemont closed 11 years ago

bluemont commented 12 years ago

For example, Treat:

Why? Is there a rationale? I find it to be a mismatch with typical Ruby style.

louismullie commented 12 years ago

The entity syntax has a rationale:

a) Aliasing all classes in Treat::Entities under the global namespace is likely to result in conflicts, due to the generic nature of the class names (Sentence, Word, etc.) The current syntax defines them as methods, meaning that they won't conflict with any existing classes in the global namespace. b) Even if we did alias the classes under the global namespace, Word("something") is actually an alias for Treat::Entities::Word.build("something"), which itself calls the constructor. It seems it would be tedious to constantly call Word.build(...)

Of course, that leaves the word "something" option open (basically same idea, but with a lowercase letter). Although this seems to fit better with idiomatic Ruby style, I find it doesn't show as well that a new entity is being built as a result of the method call. What do you think?

Concerning do, I agree with you. I just couldn't find a better name - any ideas?

louismullie commented 12 years ago

I tried implementing the syntax using lowercase functions, and found it to break a lot of stuff, including Mongoid and some native Ruby functions, in addition to several gems. Therefore I think we'll keep the current DSL for entity creation. I think the convenience it adds and the keystrokes it saves outweigh the "oddness" of the syntax. If you ever think of any other alternatives, I'll be glad to hear them.

As for do, I'm going to change it for chain, since that's more descriptive. I'll close the issue when I do.

bluemont commented 12 years ago

Thanks for taking a look at this! I don't see why lowercase methods have to break anything, though.

louismullie commented 12 years ago

As an example, Mongoid looks at all methods defined on Object and then prohibits naming fields with the same names. Therefore, one couldn't define a Mongoid field name :email or :url, since these methods would be defined.

I haven't looked into why, but one of these method names also breaks Ruby's DateTime parse() function.

Another example is a gem that used method_missing to catch a method named :symbol; but it wouldn't catch it, since it's now defined on every object.

Bottom line is, the names of the textual entity models have names that are common enough to likely conflict with libraries that we use, or libraries that users will use alongside with Treat. I feel like that would be a bit of a pain in the ass to debug.

I still do agree with you, though - I'll try to think of another way.

bluemont commented 12 years ago

Oh... I wouldn't recommend putting the methods into Object.

There are other ways... for example, look at the Rails router style. Here is how it works:

def draw(&block)
  clear! unless @disable_clear_and_finalize
  eval_block(block)
  finalize! unless @disable_clear_and_finalize
  nil
end

The key is the eval_block(block). Would you consider this approach? It is much less invasive.

louismullie commented 12 years ago

Are you pointing toward something like the following?

build do
    word 'hello'
end

Seems pretty lengthy versus:

Word 'hello'
bluemont commented 12 years ago

Maybe. Can you share some full examples, in context? I read https://github.com/louismullie/treat/wiki/Manual but I am not sure where (in what block) the commands would live. Did I overlook something?

A question... how often would a wrapping block be necessary? If it is only one time, great... Think of Rails -- each part (router, controller, model) has a separate DSL and a block or scope for it.

Polluting Object causes a "ticking time bomb" and is unadvisable. Terrible is the word I'm thinking of. The code has a lot of Object.class_eval -- might be time to get rid of it. With some examples with full context, I can recommend a better DSL.

louismullie commented 12 years ago

The only place where Object.class_eval is used is actually for the purpose we're discussing. The only use for this particular syntax is in creating entities - which can mean just creating a word from a string, to creating a document from a file or a DB record, a collection, etc. In other words, its syntactic sugar for Treat::Entities::SomeEntity.from_anything(). Refer to the "Entities" section of the manual:

# Create a word
word = Word 'run'

# Create a phrase
phrase = Phrase 'am running'

# Create a sentence
sentence = Sentence 'Welcome to Treat!'

# Create a section
section = Section "A small text\nA factitious paragraph."

# Create a document from file, url or DB record
d = Document 'text.X'   
d = Document 'http://www.example.com/XX/XX'
d = Document id: 1033232343

# Create a collection from a folder
c = Collection 'existing_folder'
bluemont commented 12 years ago

Yes, I saw those examples. Why not wrap these in a context, like the Rails router? What is the use case for needing to pollute Object?

louismullie commented 11 years ago

@bluemont What do you think of this solution?

bluemont commented 11 years ago

I just took a quick look. I have "sharp" opinions about DSLs.

The 'design decisions' about DSL's are discussed if you look in the right places. To summarize one key distinction:

  1. do you change self so that no block variable has to be used?
  2. do you use a block variable so that the external context (self) is still available?

I'm a fan of the following statements, as much as possible:

  1. Don't add methods to Object unless there is no other way. And there is usually another way!
  2. When possible, use o = Object.new; o.extend(CustomModule) to add new behaviors to a particular object.

The following, which I mentioned above, is not very 'verbose', since you only write build once:

build do
  word 'hello'
  # lots of other stuff
end

Have you brainstormed with other users? Other DSL creators?

P.S. I highly recommend a code read of Mongoid if you want to see some great code!

Caveat: I haven't been using Treat, so you know better than I what syntax / DSL you are going for. I'm using Java tools for NLP right now. Sorry :/

louismullie commented 11 years ago

Good point on not modifying Object... I changed it so that including the DSL only modifies the including class. I think this DSL looks pretty clean, and I'm satisfied with it. Some random examples:

question = question(:is_spam, :sentence)
problem = problem(question, 
  feature(:punctuation_count), 
  feature(:word_count))

d = document('http://en.wikipedia.org/wiki/NOD_mouse').
apply(:chunk, :segment, :tokenize, :tag, :category])

Thank you very much for your advice on all of this.