All morphline commands are implemented efficiently. Ballpark-wise, a simple command like readJson or readAvro or grok can process O(100k) records per second per CPU core, and we've seen O(50k) simple xqueries/sec/core. A command that does almost nothing runs at something like O(5 M) records/sec/core - the overhead of passing records among commands is no more than a Java method invocation and hence close to zero - negligible. Considering that Lucene indexing inside Solr only runs at something like O(1k) records/sec/core ballpark this means that typically Lucene (rather than a morphline) is by far the main ingestion bottleneck (unless your morphline somehow samples or filters or otherwise throws away 99% of the records while running in a huge yet latency sensitive streaming MR or Impala job).
after:
Performance is a primary concern for the implementaiton of all morphline commands. Ballpark-wise, a simple command like readJson or readAvro or grok can process approximately 100k records per second per CPU core, and we've seen approximately 50k simple xqueries per second per core. A command that does almost nothing runs at something like approximately 5M records per second per core - the overhead of passing records among commands is no more than a Java method invocation and hence close to zero - negligible. Considering that Lucene indexing inside Solr only runs at something like approximately 1k records per second per core this means that typically Lucene (rather than a morphline) is by far the main ingestion bottleneck (unless your morphline somehow samples or filters or otherwise throws away 99% of the records while running in a huge yet latency sensitive streaming MR or Impala job).
ugh the non rendered is terrible.
before:
after: