Open castarco opened 3 years ago
@castarco thanks for the report and the details! I'm not sure in which stage the function hangs according to the pdb output. Can you update connectorx
to the newest alpha version via pip install -U connectorx==0.2.1a2
and set the RUST_LOG
environment variable before importing the library like the following example:
import os
os.environ["RUST_LOG"]="connectorx=debug,connectorx_python=debug"
import connectorx as cx
df = cx.read_sql(postgres_url, query, return_type="polars")
There should be some output logs from connectorx
and connectorx_python
. Can you show them here?
@wangxiaoying After updating the library to the current alpha version and setting the environment variable, I get the following logs once it enters into the function, and it hangs after the last line is printed:
[2021-10-02T19:56:00Z DEBUG connectorx_python::arrow] Protocol: binary
[2021-10-02T19:56:00Z DEBUG connectorx_python::arrow] Running dispatcher
[2021-10-02T19:56:00Z DEBUG connectorx::dispatcher] Fetching metadata
[2021-10-02T19:56:00Z DEBUG connectorx::sql] Transformed limit 1 query: SELECT jobs.job_id, jobs.is_remote, jobs.employment_type_id, jobs.language_id, jobs.category_id, jobs.country_id, jobs.industry_id, jobs.contract_type_id, jobs.function_level_id, jobs.number_of_employees_id FROM jobs WHERE jobs.target_market_id = 1 AND jobs.has_premium_orders = true ORDER BY jobs.job_id LIMIT 1
[2021-10-02T19:56:00Z DEBUG connectorx::dispatcher] Prepare partitions
[2021-10-02T19:56:00Z DEBUG connectorx::sql] Transformed count query: SELECT count(*) FROM (SELECT jobs.job_id, jobs.is_remote, jobs.employment_type_id, jobs.language_id, jobs.category_id, jobs.country_id, jobs.industry_id, jobs.contract_type_id, jobs.function_level_id, jobs.number_of_employees_id FROM jobs WHERE jobs.target_market_id = 1 AND jobs.has_premium_orders = true) AS CXTMPTAB_COUNT
[2021-10-02T19:56:00Z DEBUG connectorx::dispatcher] Allocate destination memory
[2021-10-02T19:56:00Z DEBUG connectorx::dispatcher] Create destination partition
[2021-10-02T19:56:00Z DEBUG connectorx::dispatcher] Partition 0, 298x10
[2021-10-02T19:56:00Z DEBUG connectorx::dispatcher] Start writing
[2021-10-02T19:56:00Z DEBUG connectorx::dispatcher] Finalize partition 0
[2021-10-02T19:56:00Z DEBUG connectorx::dispatcher] Partition 0 finished
It seems to me that the destructor stucked at https://github.com/sfu-db/connector-x/blob/main/connectorx/src/dispatcher.rs#L153, because the line 155 debug!("Writing finished");
is not printed out.
To narrow it down, could you please help us try df = cx.read_sql(postgres_url, query, return_type="pandas")
to see if it is the source (postgres) or destination (arrow) that hangs the whole process?
This is the output for the experiment you proposed (using "pandas"
as return type, instead of "arrow"
or "polars"
), same output as before:
[2021-10-03T06:48:09Z DEBUG connectorx_python::pandas] Protocol: binary
[2021-10-03T06:48:09Z DEBUG connectorx_python::pandas] Running dispatcher
[2021-10-03T06:48:09Z DEBUG connectorx::dispatcher] Fetching metadata
[2021-10-03T06:48:09Z DEBUG connectorx::sql] Transformed limit 1 query: SELECT jobs.job_id, jobs.is_remote, jobs.employment_type_id, jobs.language_id, jobs.category_id, jobs.country_id, jobs.industry_id, jobs.contract_type_id, jobs.function_level_id, jobs.number_of_employees_id FROM jobs WHERE jobs.target_market_id = 1 AND jobs.has_premium_orders = true ORDER BY jobs.job_id LIMIT 1
[2021-10-03T06:48:09Z DEBUG connectorx::dispatcher] Prepare partitions
[2021-10-03T06:48:09Z DEBUG connectorx::sql] Transformed count query: SELECT count(*) FROM (SELECT jobs.job_id, jobs.is_remote, jobs.employment_type_id, jobs.language_id, jobs.category_id, jobs.country_id, jobs.industry_id, jobs.contract_type_id, jobs.function_level_id, jobs.number_of_employees_id FROM jobs WHERE jobs.target_market_id = 1 AND jobs.has_premium_orders = true) AS CXTMPTAB_COUNT
[2021-10-03T06:48:09Z DEBUG connectorx::dispatcher] Allocate destination memory
[2021-10-03T06:48:09Z DEBUG connectorx::dispatcher] Create destination partition
[2021-10-03T06:48:09Z DEBUG connectorx::dispatcher] Partition 0, 298x10
[2021-10-03T06:48:09Z DEBUG connectorx::dispatcher] Start writing
[2021-10-03T06:48:09Z DEBUG connectorx::dispatcher] Finalize partition 0
[2021-10-03T06:48:09Z DEBUG connectorx::dispatcher] Partition 0 finished
Here the script I used to reproduce the problem:
import connectorx as cx
url = "postgresql://blablablah"
query = "SELECT jobs.job_id, jobs.is_remote, jobs.employment_type_id, jobs.language_id, jobs.category_id, jobs.country_id, jobs.industry_id, jobs.contract_type_id, jobs.function_level_id, jobs.number_of_employees_id \nFROM jobs \nWHERE jobs.target_market_id = 1 AND jobs.has_premium_orders = true ORDER BY jobs.job_id"
df = cx.read_sql(url, query, return_type="pandas")
print(df)
@castarco Thanks. This helps us narrow down the bug to either rayon or the Postgres.
Problem:
_read_sql
(used viapolars
) ends up hanging in my production system (specified in the context section), while it works perfectly fine in my local environment (Macos, Posgresql 13, Python 3.9.7).I couldn't get past that function, as I guess it's implemented in Rust, and
pdb
is obviously unable to go inside.Regarding the DB state, it's arguably the same (in local dev env, and in production), as I run a set of deterministic migrations, and the DB is pretty new, so no differences in contained data either.
Context:
async
function (but there are no other concurrent calls to_read_sql
)._read_sql
(while memory usage stays the same).My
pdb
session, it hangs on the last line, you can see some variables values I displayed:Discussion
I'd be happy to continue helping to pinpoint what's the root cause of this problem. I have some experience with Rust, but I've never worked on Rust+Python projects. If there's anything I can do to debug the problem more in depth, I'll be thankful for the information.