timeplus-io / proton

A streaming SQL engine, a fast and lightweight alternative to ksqlDB and Apache Flink, 🚀 powered by ClickHouse.
https://timeplus.com
Apache License 2.0
1.42k stars 54 forks source link

Proton Ingest REST API supports raw/streaming modes #418

Open jovezhong opened 7 months ago

jovezhong commented 7 months ago

Use case

Today https://docs.timeplus.com/proton-ingest-api expects a special format as the POST payload

{
  "columns": ["id","name"],
  "data": [
    [1,"hello"],
    [2,"world"]
  ]
}

This compact data format can avoid repeating the column names. But this is too unique to Proton, and it makes it hard to integrate other HTTP service/client with Proton.

A few examples:

The workaround today is to setup a proxy server to convert the arbitrary JSON payload as the text string in raw

{
  "columns": ["raw"],
  "data": [["TEXT"]]
}

A sample code is

import { routes } from '@stricjs/app';

export function main() {
    const protonIngestEndpoint = `http://${process.env.HOST}:3218/proton/v1/ingest/streams/${process.env.STREAM}`;
    return routes()
        .post('/', c => {
            return c.text().then(a => {
                return fetch(protonIngestEndpoint, {
                    method: "POST",
                    body: `{"columns": ["raw"],"data": [["${a.replaceAll('"', '\\\"')}"]]}`,
                    headers: { "Content-Type": "application/json" },
                }).then(protonResp => new Response('status code ' + protonResp.status));
            });
        })
}

But this solution is not great and also risky. It just tries to replace " to \", but there could be other cases making the TEXT breaks the JSON format.

Describe the solution you'd like

Similar to Neutron REST API, we should support multiple mode. (default remains the current one)

When this is ready, loading data from wikipedia to proton will be as simple as

curl -s -H 'Accept: application/json' https://stream.wikimedia.org/v2/stream/recentchange | while read -r line; do echo "$line" | curl -s -X POST -d @- http://localhost:3218/proton/v1/ingest/streams/wiki?format=raw; done

One thing to discuss, for raw/lines, since the stream schema is fixed, can we auto-create the stream if the stream doesn't exist?

Describe alternatives you've considered

Additional context

jovezhong commented 6 months ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.

jovezhong commented 6 months ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.

jovezhong commented 5 months ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.

jovezhong commented 5 months ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.

jovezhong commented 4 months ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.

jovezhong commented 4 months ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.

jovezhong commented 3 months ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.

jovezhong commented 3 months ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.

jovezhong commented 3 months ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.

jovezhong commented 2 months ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.

jovezhong commented 2 months ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.

jovezhong commented 1 month ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.

jovezhong commented 1 month ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.

jovezhong commented 3 weeks ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.

jovezhong commented 1 week ago

(Jove Github Bot) assuming it is not done, deferred this ticket to the next sprint.