iangow / se_core

Core code for StreetEvents data
7 stars 5 forks source link

Handle connections when updating calls #10

Open iangow opened 4 years ago

iangow commented 4 years ago

https://github.com/iangow/se_core/blob/92d0fc1fe90001eb21dc31d32e20cd417f791180/update_se.sh#L7

Need to disconnect users who are connected to streetevents.calls before DROP of the table.

bdcallen commented 4 years ago

@iangow Just been playing around with the code in create_call.R. Have you got these warning messages before?

> original_names <-
+     calls_raw %>%
+     inner_join(call_files, by = c("file_path", "sha1", "file_name")) %>%
+     group_by(file_name, last_update) %>%
+     filter(mtime == min(mtime, na.rm = TRUE)) %>%
+     select(sha1, file_path, file_name, last_update, company_name, call_desc, event_title, city) %>%
+     ungroup() %>%
+     distinct() %>%
+     compute()
Warning messages:
1: `lang_name()` is deprecated as of rlang 0.2.0.
Please use `call_name()` instead.
This warning is displayed once per session. 
2: `lang()` is deprecated as of rlang 0.2.0.
Please use `call2()` instead.
This warning is displayed once per session. 
iangow commented 4 years ago

Yes. This is harmless. I will update packages to see if that makes it go away.

bdcallen commented 4 years ago

@iangow Yeah, I thought it was probably harmless. Just making sure

bdcallen commented 4 years ago

@iangow Do you know what type of user queries are putting locks on calls? If they're just using SELECT commands, I'm under the impression they simply get a snapshot of table at the current moment, and an UPDATE, ALTER TABLE or DROP TABLE can proceed while this is happening. Though if someone were to do a query selecting from calls in order to update another table, then perhaps the commands to update calls could be locked. I'm currently reading through this, this, this and this.

bdcallen commented 4 years ago

@iangow I tried create_calls_test.R without the lines I had to kick users off the database, and it worked fine

bdcallen@igow-z640:~/se_core$ Rscript create_calls_test.R
Warning messages:
1: `lang_name()` is deprecated as of rlang 0.2.0.
Please use `call_name()` instead.
This warning is displayed once per session. 
2: `lang()` is deprecated as of rlang 0.2.0.
Please use `call2()` instead.
This warning is displayed once per session. 

So I'm a bit confused now. If there is indeed a lock being produced by the code, shouldn't we be running into this problem every time? Or perhaps cursors aren't automatically freed by R after each use of the program?

iangow commented 4 years ago

As discussed, I think the first step is to create the problem. It seems that

> Sys.setenv(PGHOST="10.101.13.99", PGDATABASE="crsp")
> library(dplyr, warn.conflicts = FALSE)
> library(DBI)
> 
> pg <- dbConnect(RPostgres::Postgres())
> test271107 <- tbl(pg, "test271107")
> test271107
# Source:   table<test271107> [?? x 9]
# Database: postgres [igow@10.101.13.99:5432/crsp]
  file_name  last_update         speaker_name  employer   role  speaker_number speaker_text               context section
  <chr>      <dttm>              <chr>         <chr>      <chr>          <int> <chr>                      <chr>     <int>
1 10889324_T 2017-09-13 21:45:49 Glenn R. Lan… Internati… CFO …             48 "Well, it's a good questi… qa            1
> test271107
Error in result_create(conn@ptr, statement) : 
  Failed to prepare query: ERROR:  relation "test271107" does not exist
LINE 2: FROM "test271107"

does not prevent me from running DROP TABLE test271107 ; (I did this after the first invocation of test271107 in R, hence the error.

iangow commented 4 years ago

Maybe RPostgreSQL does it differently.

iangow commented 4 years ago

Maybe RPostgreSQL does it differently.

Nope. Maybe viewing the data in pgAdmin creates a lock.

bdcallen commented 4 years ago

"Nope. Maybe viewing the data in pgAdmin creates a lock."

@iangow I concur. Just ran the whole code using run in Rstudio, then did

bdcallen@igow-z640:~/se_core$ Rscript create_calls_test.R
Warning messages:
1: `lang_name()` is deprecated as of rlang 0.2.0.
Please use `call_name()` instead.
This warning is displayed once per session. 
2: `lang()` is deprecated as of rlang 0.2.0.
Please use `call2()` instead.
This warning is displayed once per session. 

So running the code in RStudio console didn't prevent me running in the script in an independent terminal shell.

bdcallen commented 4 years ago

Maybe RPostgreSQL does it differently.

Nope. Maybe viewing the data in pgAdmin creates a lock.

Let me try that by using the other desktop on my desk

bdcallen commented 4 years ago

Maybe RPostgreSQL does it differently.

Nope. Maybe viewing the data in pgAdmin creates a lock.

Let me try that by using the other desktop on my desk

Nope, not that either. Did

SELECT * FROM streetevents.calls_test LIMIT 2000

in pgAdmin on the other desktop, then did

bdcallen@igow-z640:~/se_core$ Rscript create_calls_test.R
Warning messages:
1: `lang_name()` is deprecated as of rlang 0.2.0.
Please use `call_name()` instead.
This warning is displayed once per session. 
2: `lang()` is deprecated as of rlang 0.2.0.
Please use `call2()` instead.
This warning is displayed once per session. 
bdcallen@igow-z640:~/se_core$ 

The program still ran.

bdcallen commented 4 years ago

@iangow Perhaps we can put cronjob back on for a few days, and then see the output to pg_stat_activity and pg_locks?

iangow commented 4 years ago

I think that's a good idea. Perhaps we could just run it manually on days when you're in and then monitor it. Once we have an issue, we could try to figure it out.