kite-sdk / kite

Kite SDK
http://kitesdk.org/docs/current/
Apache License 2.0
394 stars 263 forks source link

KITE-1047 clarify language on performance in the morphlines introduction #401

Closed busbey closed 9 years ago

busbey commented 9 years ago

ugh the non rendered is terrible.

before:

All morphline commands are implemented efficiently. Ballpark-wise, a simple command like readJson or readAvro or grok can process O(100k) records per second per CPU core, and we've seen O(50k) simple xqueries/sec/core. A command that does almost nothing runs at something like O(5 M) records/sec/core - the overhead of passing records among commands is no more than a Java method invocation and hence close to zero - negligible. Considering that Lucene indexing inside Solr only runs at something like O(1k) records/sec/core ballpark this means that typically Lucene (rather than a morphline) is by far the main ingestion bottleneck (unless your morphline somehow samples or filters or otherwise throws away 99% of the records while running in a huge yet latency sensitive streaming MR or Impala job).

after:

Performance is a primary concern for the implementaiton of all morphline commands. Ballpark-wise, a simple command like readJson or readAvro or grok can process approximately 100k records per second per CPU core, and we've seen approximately 50k simple xqueries per second per core. A command that does almost nothing runs at something like approximately 5M records per second per core - the overhead of passing records among commands is no more than a Java method invocation and hence close to zero - negligible. Considering that Lucene indexing inside Solr only runs at something like approximately 1k records per second per core this means that typically Lucene (rather than a morphline) is by far the main ingestion bottleneck (unless your morphline somehow samples or filters or otherwise throws away 99% of the records while running in a huge yet latency sensitive streaming MR or Impala job).

busbey commented 9 years ago

failures appear to be related to the HBase module and not a part of this PR.