Open mayurjain0312 opened 6 years ago
Can you please describe some of the symptoms you're experiencing? table.mutate_rows
is a synchronous operation, so you're limited to one call at a time. How many rows per second are you seeing on the cloud console?
Is this the only "bulk write" option that BigTable offers? From the console, I can see around 800 rows written per second. (Production instance with 3 nodes)
We are currently using mongodb as our no-sql database and we would prefer moving to BigTable. As far as the speed of writing data to database is concerned, I see that mongodb is far exceeding BigTable(unless I am not using BigTable in the right manner)
@sduskis: any updates?
A 3 node Cloud Bigtable cluster can handle writing at least 30K rows per second, and each additional node adds 10K rows per second. We tested up to 3,500 nodes with Cloud Bigtable with a linear increase in performance.
The 800 rows per second is a limitation of the python client and what ever VM you're on. The process sends 300 rows, and waits for a response. You can run multiple processes that each read in a different file to improve total throughput against your cluster.
The Java client has an async client that performs far better in these situations, since it has a robust async client. We have not ported that functionality to python yet. Here is an example with the Java version.
(FWIW, I'm the primary developer on the Java client)
@sduskis : Thanks for responding. Do you know when can this async functionality for Python be ready?
We don't have an ETA for this functionality in Python.
Hi,
Is there any progress on this feature? I would really like to scale my writes to Bigtable with Python code.
We still do not have concrete plans to implement this feature.
Hello! Are there any updates on plans to implement this feature?
Async is quickly becoming the norm in terms of modern python development. :)
BigTable IO is consistently the only blocking IO I encounter on a regular basis.
DynamoDB as a async wrapper in the form of aioboto3
@sduskis I can help implement this if you can provide me a little guidance. Can provide a rough sketch?
@cwbeitel There's a lot that goes into this. There's a Java implementation that would be similar, but I'm not sure how much it would help (here).
Here are some constraints that I think are important:
@sduskis Yeah that's cool I hear those thoughts. Thanks for sharing the Java implementation, that's helpful. Looks like the PubSub python client code, e.g. the published message batcher, would also be helpful to emulate.
Here's a gist contextualizing this in the case of streaming deep learning training examples to a cbt table.
Having had a look at this I'm probably going to first try to accomplish the same using the Golang client given how involved it would be to do this with the Python client.
Any updates since last year?
I would like to express my wishes to make this a P1 priority feature request for 2021 ❤️
Any updates?
Hello, any updates? 🙃
please show some love for big table async client ??
Ubuntu 16.04 Python version and virtual environment information python --version - python 2.7.12
With a batch size of 300, total 3 nodes in an instance, the write throughput is not good for BigTable by using
mutate_rows
api call.