xjl219 / boilerpipe

Automatically exported from code.google.com/p/boilerpipe
0 stars 0 forks source link

Unconventional operator used for boolean logic #11

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Just a small suggestion to help others with reading the code.
I opened ArticleExtractor and saw the following:
  return TerminatingBlocksFinder.INSTANCE.process(doc)
      | new DocumentTitleMatchClassifier(doc.getTitle()).process(doc)
      | NumWordsRulesClassifier.INSTANCE.process(doc)
      | IgnoreBlocksAfterContentFilter.DEFAULT_INSTANCE.process(doc)
      | BlockProximityFusion.MAX_DISTANCE_1.process(doc)
      | BoilerplateBlockFilter.INSTANCE.process(doc)
      | BlockProximityFusion.MAX_DISTANCE_1_CONTENT_ONLY.process(doc)
      | KeepLargestFulltextBlockFilter.INSTANCE.process(doc)
      | ExpandTitleToContentFilter.INSTANCE.process(doc);

This was very confusing to me.  The | operator in Java is usually reserved for 
bitwise operations and it appears that it's the boolean or operation that is 
being done here for which || is typically used.  I was surprised this even 
compiles though it turns out it is valid and function exactly the same.  It 
would really help readability to replace the | with || throughout since that is 
the standard Java convention.

Original issue reported on code.google.com by benjamin...@gmail.com on 21 Nov 2010 at 7:37

GoogleCodeExporter commented 9 years ago
Hi Benjamin,

thanks for your comment.

The difference between the "bitwise OR" ("|") and the "conditional OR"/"logical 
OR" ("||") is that the bitwise OR enforces the execution of all expressions, 
whereas the logical OR aborts the evaluation of arguments as soon as one 
expression evaluates to "true".

Check this example class (attached) to see the difference.
Also cf. http://docstore.mik.ua/orelly/java/langref/ch04_10.htm and 
http://docstore.mik.ua/orelly/java/langref/ch04_11.htm for some explanation of 
the differences between | and ||.

As a conclusion, it would simply be incorrect to use || in this context, as the 
filter pipe would not get evaluated completely. (You probably now also see a 
connection to bash-style pipes, which is more that just a co-incidence, and now 
you also know why this software is called boiler*pipe* ;-)

Cheers,
Christian

Original comment by ckkohl79 on 21 Nov 2010 at 12:10

Attachments:

GoogleCodeExporter commented 9 years ago
Wow.  Learned something new today!  Thanks =)

Original comment by benjamin...@gmail.com on 21 Nov 2010 at 8:03

GoogleCodeExporter commented 9 years ago
You are welcome :)

Best,
Christian

Original comment by ckkohl79 on 21 Nov 2010 at 8:07