transientskp / tkp

A transients-discovery pipeline for astronomical image-based surveys
http://docs.transientskp.org/
BSD 2-Clause "Simplified" License
19 stars 14 forks source link

TKP_DBHOST parameter not passed to the tests #600

Closed AntoniaR closed 1 year ago

AntoniaR commented 1 year ago

A lot of tests are failing as the tests are trying to connect to the wrong host for the database. The host should be passed by using: export TKP_DBHOST='path_to_host'. However, the tests are trying to connect to 'localhost' instead so it is ignoring this - and hence the password also fails. When running TraP normally (e.g. trap_manage.py run testdata), it is able to connect to the database correctly.

connection to server at "localhost" (::1), port 5432 failed: FATAL:  password authentication failed for user "antoniar"

I will try digging further into this issue now

HannoSpreeuw commented 1 year ago

Okay, which branch is this?

AntoniaR commented 1 year ago

I'm currently using converted_to_python3 but just noticed there is another converted_to_python3_with_sourcefinder_from_pyse_repo - should I be using that branch instead?

HannoSpreeuw commented 1 year ago

Yes, that is the branch we have been discussing for PR #596 .

AntoniaR commented 1 year ago

Ah, sorry! I somehow missed that.

I will switch branches and make my changes in the cfg files again. Then I'll retry the tests - hopefully this fixes some of the issues I'm having :)

AntoniaR commented 1 year ago

Ok, switched branches but still getting this issue.

AntoniaR commented 1 year ago

I'm trying out an older version of TraP and running the tests again - just in case I'm doing something wrong!

AntoniaR commented 1 year ago

Ok... so I now remember how I did the tests. I followed the instructions in the docs rather than using runtests.sh. Specifically, I used this to define and create the database: https://tkp.readthedocs.io/en/latest/devref/procedures/testing.html#database I got this working with the r5.0 tkp release (apart from known errors).

I then duplicated this with the branch: converted_to_python3_with_sourcefinder_from_pyse_repo and TKP_DBHOST was not passed and hence the authentication failed as it was trying to use localhost.

AntoniaR commented 1 year ago

This is still failing with python 3.9 and conda. As the database password and host are not being passed to the tests, I cannot successfully run the tests on Struis.

HannoSpreeuw commented 1 year ago

Okay, I am trying to reproduce this from my laptop by choosing something different from localhost for TKP_DBHOST. If I set something stupid like export TKP_DBHOST=remote_host with remote_host nonexistent, unfortunately, and then run, in a Python 3.10 environment: trap-manage.py initdb I get

conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "remote_host" to address: Name or service not known

I suppose you would get the same message if remote_host were nonexistent. I guess I need to do this on a network machine to mimic your situation.

AntoniaR commented 1 year ago

Would it be helpful for me to give you a log in on our machine to try it there?

HannoSpreeuw commented 1 year ago

Thanks for that. Let me try to think of setting up something simple locally first, perhaps a virtual machine could serve as a remote host, I guess lots of options are posted on the web, I'll dive into this.

HannoSpreeuw commented 1 year ago

Just trying to establish some common ground first....

I can successfully run trap-manage.py initdb -y for local databases. So this works for me:

# rm -rf tkp_data_cluster
initdb -D tkp_data_cluster
pg_ctl -D tkp_data_cluster -l logfile start
createdb tkp_test_db
export TKP_DBNAME=tkp_test_db
trap-manage.py initdb -y

in a shell script ran as . ./shell_script.sh . Mind the . and space before ./shell_script.sh. Please confirm that you can run this successfully for this locally created database, using the current converted_to_python3_with_sourcefinder_from_pyse_repo branch installed with pip install -e ".[pixelstore]" using Python 3.10.

Now the remote part. I launched an Apache2 webserver in a Virtualbox VM running Ubuntu. I could access it from the host at a local IP address through port 80, it displays the "Apache2 Ubuntu Default Page". I set

export TKP_DBHOST=some_local_IP_adress 
export TKP_DBPORT=80

This is not what we want, because there is no database at that local IP address and the port is a http port, but at least I can check if setting a different TKP_DBHOST environment variable has any effect. It does, i.e. when I reexecute . ./shell_script.sh I first see the same output as before, but then after server started and some warnings, it hangs for a while. After some time, it terminates with psycopg2.OperationalError: server closed the connection unexpectedly. This is to be expected, since it is probably trying to access a database that does not exist.

So it does not seem to overwrite the value of TKP_DBHOST by localhost in my case.

It is not clear to me yet why it does overwrite the value of TKP_DBHOST in your case.

jdswinbank commented 1 year ago

Hi folks,

I was curious about this, but too lazy to actually go as far as setting up a database. I realised that that isn't really important, though; all that's necessary is to see which database the tests are trying to connect to. So I did the following:

$ git log -1
commit 683081f866b48e9408b336420e448e2bcd49114d (HEAD -> converted_to_python3_with_sourcefinder_from_pyse_repo, origin/converted_to_python3_with_sourcefinder_from_pyse_repo)
Author: Hanno Spreeuw <h.spreeuw@esciencecenter.nl>
Date:   Thu Dec 15 15:17:57 2022 +0100

    14 unit tests were skipped because the data was sought in the wrong folder. This commit fixes that.

$ python --version
Python 3.10.8

$ echo $TKP_DBHOST  #  it's not set

$ python runtests.py -v | tail
ERROR test_database/test_alchemy.py::TestApi::test_calculate_varmetric - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
ERROR test_database/test_alchemy.py::TestApi::test_calculate_varmetric_cutoff - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
ERROR test_database/test_alchemy.py::TestApi::test_calculate_varmetric_newsource - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
ERROR test_database/test_alchemy.py::TestApi::test_calculate_varmetric_region - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
ERROR test_database/test_alchemy.py::TestApi::test_combined - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
ERROR test_database/test_alchemy.py::TestApi::test_last_assoc_per_band - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
ERROR test_database/test_alchemy.py::TestApi::test_last_assoc_timestamps - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
ERROR test_database/test_alchemy.py::TestApi::test_last_ts_fmax - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
ERROR test_database/test_alchemy.py::TestApi::test_newsrc_trigger - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
ERROR test_database/test_alchemy.py::TestApi::test_store_varmetric - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
=============================================================================================================== 4 warnings, 10 errors in 6.08s ==========================================================================

$ export TKP_DBHOST=a_dummy_host_not_localhost

$ python runtests.py -v | tail
ERROR test_database/test_alchemy.py::TestApi::test_calculate_varmetric - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "a_dummy_host_not_localhost" to address: Name or service not known
ERROR test_database/test_alchemy.py::TestApi::test_calculate_varmetric_cutoff - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "a_dummy_host_not_localhost" to address: Name or service not known
ERROR test_database/test_alchemy.py::TestApi::test_calculate_varmetric_newsource - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "a_dummy_host_not_localhost" to address: Name or service not known
ERROR test_database/test_alchemy.py::TestApi::test_calculate_varmetric_region - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "a_dummy_host_not_localhost" to address: Name or service not known
ERROR test_database/test_alchemy.py::TestApi::test_combined - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "a_dummy_host_not_localhost" to address: Name or service not known
ERROR test_database/test_alchemy.py::TestApi::test_last_assoc_per_band - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "a_dummy_host_not_localhost" to address: Name or service not known
ERROR test_database/test_alchemy.py::TestApi::test_last_assoc_timestamps - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "a_dummy_host_not_localhost" to address: Name or service not known
ERROR test_database/test_alchemy.py::TestApi::test_last_ts_fmax - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "a_dummy_host_not_localhost" to address: Name or service not known
ERROR test_database/test_alchemy.py::TestApi::test_newsrc_trigger - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "a_dummy_host_not_localhost" to address: Name or service not known
ERROR test_database/test_alchemy.py::TestApi::test_store_varmetric - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "a_dummy_host_not_localhost" to address: Name or service not known
=============================================================================================================== 4 warnings, 10 errors in 5.76s ===============================================================================================================

In short: the change to TKP_DBHOST does seem to be propagating right through to the tests, using Python 3.10.8 on the latest version of Hanno's branch. There's no evidence of the problem Antonia reports.

Antonia, is it all database-backed tests that are failing, or just some of them? I'm wondering if there's a localhost hard-coded in one particular test file, or something like that.

AntoniaR commented 1 year ago

So... I tried this again today with a new database and the error has vanished! My hypothesis is that this was caused by a problem in the old operating system on Struis that was fixed when John upgraded it.