introduce columnar chunk format

flashmouse commented 2 years ago

Is your feature request related to a problem? Please describe. Loki's index design is quite simple, it require every series must have at least one chunk in ingester's memory, and then flush this to s3/nosql-db as one key-value directly. although this design avoid the complex of compaction in some lsm-like storage format system, but therefore loki's active series has a lower upper limit, and its query lag almost depend on disk & network bandwidth. I believe many Loki users face the problem that Loki do query so slow in many situation.

I think some way decrease the usage of network/disk bandwidth during query could significantly increase query experience.

Describe the solution you'd like current chunk's format is a NSM-like format, when do filter, Loki must read the whole log to decide whether this log should return to user. but user's query filter in most situation is clear and key word is quite short, "I want to check user's log so filter isUID=xxx " , "I want to check whether remote_request=Kafka have something wrong", and so on. developer know this word and label, but Loki coudln't add index to them because uid/tid .etc info is a so called high cardinality label.

I recommand chunk format could implement a dsm-like column format, uid, tid, remote-system could be independent column,
when user do query, the label match may be something like {label1=value1, ... uid=xxx}. when handle this request, Loki could only read chunk's uid column, then decide which one should read the whole log.
in this situation, disk bandwidth should use much less than before, and of course it should return query faster.

when store chunk in s3, because Loki should read whole chunk data to local, this way may not have special advatange than expected, instead I recommand save chunk use parquet format, so the query could done in s3 local, Loki needn't fetch whole data to local.

generally, columnar format means Loki's log should definition schema, and the schema's field may look like a no index label. becuase log-system user almost are developer, in the situation def a suitable schema is not a hard work.

liguozhong commented 2 years ago

If I don't understand it wrong, you expect to define the schema when writing. but you can never predict which fields the user expects in the schema, and loki will fall into the problem of the traditional logging system. Defining the schema at write time will incur huge storage costs, but the frequency of users querying the log is very infrequent. This model will make the cost even higher. If you want to hit more predefined schemas.

I don't expect to see loki go in the direction of defining schemas at write time. In my opinion, loki has achieved great success through the design of the schema defined at read time.

"just give me all the logs, I want to grep".

I think the general direction of loki's future development should continue to move in the direction of defining schema when reading, so as to succeed in the lowest cost logging system.

like the cache, more partition definitions, more query goroutine concurrency, more query replica, and better compression ratios,etc... Autoscaling the read path: https://github.com/grafana/loki/issues/5669 [Frontend] Log Result Cache: https://github.com/grafana/loki/pull/5502

flashmouse commented 2 years ago

Hi @liguozhong I believe you not understand my idea clearly. Some storage system(es etc.) have heavy write cost because they generate lots of index, which mean they must calculate and write date more than once, so they use much more cpu & disk bandwith in write path. but I just mean write data using column format, not only write them row by row. you just think the difference like clickhouse & mysql's innodb. I believe you can get many paper or blog describe the difference, advantage/disadvantage between column sotrage and row storage system. write data with column format isn't a strict schema, it's flexible, it just could benefit from data locality when doing query. I think loki doesn't plan do so just because they not want to implement a system has complex compaction step, this may lead to write amplification and hard to keep system architecture.

I'm doing code to so called "timeseries database" now, I observed many system want to avoid the complex of merge/compaction. but they change their implement at last. no merge will have many problem:

system cannot write data out of order. our colleague have already met this when use loki --- it lead to create a chunk save much longer time than configuration.
loki have to write many small chunks or it will drain ingester's memory very fast -- also this make column format storage not so efficient -- and write lot's of index entries.
loki's query now cost lots of disk/network bandwith, if you want to get result fast, you must cost very, very, very, very huge bandwith than all other storage systems I have ever seen. IMHO, I think only storage system like S3 could hand this usage scenario, any database(cassandra etc.) with normal hardware&netwrok will hang forever. it also has limitation.

many years ago map-reduce appeard, and now nobody use it any more, becasue it hard to write and its performance very poor, instead developers try hard to write more efficient query optimization and storage format.

I believe loki will do more become better, I just discuss with community with some poor idea :)

sandstrom commented 1 year ago

For reference, here is a related issue: https://github.com/grafana/loki/issues/91

grafana / loki

introduce columnar chunk format #5723