kaizen-ai / kaizenflow

KaizenFlow is a framework for Bayesian reasoning and AI/ML stream computing
GNU General Public License v3.0
107 stars 75 forks source link

Create test list to run with Sorrentum #283

Closed gpsaggese closed 11 months ago

gpsaggese commented 1 year ago

From https://github.com/sorrentum/sorrentum/issues/189#issuecomment-1563426753

Contributors can use repos outside our infra on their laptop, thus some tests might not work (e.g., if there is a dependency on AWS).

We want to mark unit tests based on what kind of support is needed, then contributors can just run our pytest flow skipping all the tests that are not expected to work outside our infra.

Assigning to @samarth9008 as current master of outsourcing. We can do a quick PR to get the skeleton in place. @PomazkinG and @jsmerix can help.

alejandroBallesterosC commented 1 year ago

https://github.com/sorrentum/sorrentum/pull/401

gpsaggese commented 1 year ago

FYI @gpsaggese I edited the comment bcz interns do not have access to cmamp

samarth9008 commented 1 year ago

@dchoi127 @Ro0k1e, you guys will work together on this task as this task is big as you have to go through all the tests which are failing and mark them accordingly. @Ro0k1e is working on his first tasks so make him get used to our procedure. If needed I can assign one more person to it.

samarth9008 commented 1 year ago

Go through the whole conversation to get more clear idea.

samarth9008 commented 1 year ago

There is a PR #401 already created for it. If you guys want can create a different branch and new PR or work the created one. If creating a new one than no need to do 1st step mentioned in https://github.com/sorrentum/sorrentum/issues/283#issuecomment-1629476544

dchoi127 commented 1 year ago

When I run i run_fast_tests the output just hangs. I've waited about 20 minutes. This is the following output. Am i doing something wrong? I've started up the amp.client_venv (but not the docker container) and ran i run_fast_tests is this correct?

(amp.client_venv) (base) davidchoi@wireless-10-104-183-87 sorrentum1 % i run_fast_tests
INFO: > cmd='/Users/davidchoi/src/venv/amp.client_venv/bin/invoke run_fast_tests'
## run_fast_tests: 
15:31:52 - INFO  lib_tasks_pytest.py _run_test_cmd:219                  cmd=IMAGE=sorrentum/cmamp:dev \
NETWORK_MODE=bridge \
        docker-compose \
        --file /Users/davidchoi/src/sorrentum1/devops/compose/docker-compose.yml \
        --env-file devops/env/default.env \
        run \
        --rm \
        --name davidchoi.cmamp.app.sorrentum1.20230711_153152 \
        --user $(id -u):$(id -g) \
        app \
        'pytest -m "not slow and not superslow" . -o timeout_func_only=true --timeout 5 --reruns 2 --only-rerun "Failed: Timeout"' 
15:31:52 - INFO  lib_tasks_docker.py _docker_cmd:1252                   Pulling the latest version of Docker
## docker_pull: 
## docker_login: 
  ... 
  ... The config profile (ck) could not be found
15:31:53 - INFO  lib_tasks_docker.py _docker_pull:226                   image='sorrentum/cmamp:dev'
docker pull sorrentum/cmamp:dev
dev: Pulling from sorrentum/cmamp
Digest: sha256:7d9ee52407e426c8d0c6611bebb3a5e76bf05d504122aaa50bf6765dc500a2f7
Status: Image is up to date for sorrentum/cmamp:dev
docker.io/sorrentum/cmamp:dev
IMAGE=sorrentum/cmamp:dev \
NETWORK_MODE=bridge \
        docker-compose \
        --file /Users/davidchoi/src/sorrentum1/devops/compose/docker-compose.yml \
        --env-file devops/env/default.env \
        run \
        --rm \
        --name davidchoi.cmamp.app.sorrentum1.20230711_153152 \
        --user $(id -u):$(id -g) \
        app \
        'pytest -m "not slow and not superslow" . -o timeout_func_only=true --timeout 5 --reruns 2 --only-rerun "Failed: Timeout"' 
WARNING: The AM_AWS_ACCESS_KEY_ID variable is not set. Defaulting to a blank string.
WARNING: The AM_AWS_DEFAULT_REGION variable is not set. Defaulting to a blank string.
WARNING: The AM_AWS_SECRET_ACCESS_KEY variable is not set. Defaulting to a blank string.
WARNING: The AM_FORCE_TEST_FAIL variable is not set. Defaulting to a blank string.
WARNING: The AM_TELEGRAM_TOKEN variable is not set. Defaulting to a blank string.
WARNING: The CK_AWS_ACCESS_KEY_ID variable is not set. Defaulting to a blank string.
WARNING: The CK_AWS_DEFAULT_REGION variable is not set. Defaulting to a blank string.
WARNING: The CK_AWS_SECRET_ACCESS_KEY variable is not set. Defaulting to a blank string.
WARNING: The CK_TELEGRAM_TOKEN variable is not set. Defaulting to a blank string.
Creating compose_app_run ... done
##> devops/docker_run/entrypoint.sh
UID=501
GID=20
# Activate environment
##> devops/docker_run/setenv.sh
# Set PATH
PATH=/app/documentation/scripts:/app/dev_scripts/testing:/app/dev_scripts/notebooks:/app/dev_scripts/install:/app/dev_scripts/infra:/app/dev_scripts/git:/app/dev_scripts/aws:/app/dev_scripts:/app:.:/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# Set PYTHONPATH
PYTHONPATH=/app:
# Configure env
git --version: git version 2.25.1
/app
WARNING: AWS credential check failed: can't find /home/.aws/credentials file.
WARNING: AWS credential check failed: can't find /home/.aws/config file.
# Check AWS authentication setup
      Name                    Value             Type    Location
      ----                    -----             ----    --------
   profile                       am           manual    --profile

The config profile (am) could not be found
AM_CONTAINER_VERSION='1.4.0'
which python: /venv/bin/python
python -V: Python 3.8.10
helpers: <module 'helpers' from '/app/helpers/__init__.py'>
PATH=/app/documentation/scripts:/app/dev_scripts/testing:/app/dev_scripts/notebooks:/app/dev_scripts/install:/app/dev_scripts/infra:/app/dev_scripts/git:/app/dev_scripts/aws:/app/dev_scripts:/app:.:/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH=/app:
entrypoint.sh: 'pytest -m "not slow and not superslow" . -o timeout_func_only=true --timeout 5 --reruns 2 --only-rerun "Failed: Timeout"'
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)
INFO: > cmd='/venv/bin/invoke print_env'
-----------------------------------------------------------------------------
This code is not in sync with the container:
code_version='1.4.3' != container_version='1.4.0'
-----------------------------------------------------------------------------
You need to:
- merge origin/master into your branch with `invoke git_merge_master`
- pull the latest container with `invoke docker_pull`
# Repo config:
  # repo_config.config
    enable_privileged_mode='False'
    get_docker_base_image_name='cmamp'
    get_docker_shared_group=''
    get_docker_user=''
    get_host_name='github.com'
    get_html_dir_to_url_mapping='{'s3://cryptokaizen-html': 'http://172.30.2.44'}'
    get_invalid_words='[]'
    get_name='//sorr'
    get_repo_map='{'sorr': 'sorrentum/sorrentum'}'
    get_shared_data_dirs='None'
    has_dind_support='False'
    has_docker_sudo='True'
    is_CK_S3_available='True'
    run_docker_as_root='False'
    skip_submodules_test='False'
    use_docker_db_container_name_to_connect='True'
    use_docker_network_mode_host='False'
    use_docker_sibling_containers='True'
    # hserver.config
      is_AM_S3_available()='True'
      is_dev4()='False'
      is_dev_ck()='False'
      is_inside_ci()='False'
      is_inside_docker()='True'
      is_mac(version='Catalina')='False'
      is_mac(version='Monterey')='False'
      is_mac(version='Ventura')='True'
# System signature:
  # Git
    branch_name='CmTask283_Create_test_list_to_run_with_Sorrentum'
    hash='2f2b7daf5'
    # Last commits:
      * 2f2b7daf5 dchoi127 Gallery Notebook and Test File (#393)                             (    4 days ago) Fri Jul 7 12:35:49 2023  (HEAD -> CmTask283_Create_test_list_to_run_with_Sorrentum, origin/CmTask283_Create_test_list_to_run_with_Sorrentum)
      * efb7fdeb7 zli00185 Unit test get_universe_versions() for CCXT (#387)                 (    5 days ago) Fri Jul 7 00:11:06 2023           
      * 7d5810984 hhxjqm   SorrTask390_Unit_test_get_run_date (#391)                         (    5 days ago) Fri Jul 7 00:10:32 2023           
  # Machine info
    system=Linux
    node name=347078c60199
    release=5.10.76-linuxkit
    version=#1 SMP PREEMPT Mon Nov 8 11:22:26 UTC 2021
    machine=x86_64
    processor=x86_64
    cpu count=4
    cpu freq=None
    memory=svmem(total=2085294080, available=648499200, percent=68.9, used=1246044160, free=48664576, active=642686976, inactive=1206915072, buffers=58896384, cached=731688960, shared=100716544, slab=137158656)
    disk usage=sdiskusage(total=62725623808, used=28147331072, free=31361576960, percent=47.3)
  # Packages
    python: 3.8.10
    cvxopt: 1.3.0
    cvxpy: 1.2.2
    gluonnlp: ?
    gluonts: 0.6.7
    joblib: 1.2.0
    mxnet: 1.9.1
    numpy: 1.23.4
    pandas: 1.5.1
    pyarrow: 10.0.0
    scipy: 1.9.3
    seaborn: 0.12.1
    sklearn: 1.1.3
    statsmodels: 0.13.5
# Env vars:
  AM_AWS_ACCESS_KEY_ID=undef
  AM_AWS_DEFAULT_REGION=undef
  AM_AWS_PROFILE='am'
  AM_AWS_S3_BUCKET='alphamatic-data'
  AM_AWS_SECRET_ACCESS_KEY=undef
  AM_ECR_BASE_PATH='665840871993.dkr.ecr.us-east-1.amazonaws.com'
  AM_ENABLE_DIND='0'
  AM_FORCE_TEST_FAIL=''
  AM_HOST_NAME='wireless-10-104-183-87.umd.edu'
  AM_HOST_OS_NAME='Darwin'
  AM_HOST_USER_NAME='davidchoi'
  AM_HOST_VERSION='22.1.0'
  AM_REPO_CONFIG_CHECK='True'
  AM_REPO_CONFIG_PATH=''
  AM_TELEGRAM_TOKEN=empty
  CI=''
  CK_AWS_ACCESS_KEY_ID=empty
  CK_AWS_DEFAULT_REGION=''
  CK_AWS_S3_BUCKET='cryptokaizen-data'
  CK_AWS_SECRET_ACCESS_KEY=empty
  CK_ECR_BASE_PATH='sorrentum'
  GH_ACTION_ACCESS_TOKEN=empty

15:32:18 - INFO  hcache.py clear_global_cache:292                       Before clear_global_cache: 'global mem' cache: path='/mnt/tmpfs/tmp.cache.mem', size=32.0 KB
15:32:18 - WARN  hcache.py clear_global_cache:293                       Resetting 'global mem' cache '/mnt/tmpfs/tmp.cache.mem'
15:32:18 - WARN  hcache.py clear_global_cache:303                       Destroying '/mnt/tmpfs/tmp.cache.mem' ...
15:32:18 - INFO  hcache.py clear_global_cache:319                       After clear_global_cache: 'global mem' cache: path='/mnt/tmpfs/tmp.cache.mem', size=nan
====================================================================== test session starts =======================================================================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /venv/bin/python
cachedir: .pytest_cache
rootdir: /app, configfile: pytest.ini
plugins: rerunfailures-10.2, cov-4.0.0, xdist-3.0.2, anyio-3.6.2, instafail-0.4.2, timeout-2.1.0
timeout: 5.0s
timeout method: signal
timeout func_only: True
collecting 1437 items                                                                                                                                            
samarth9008 commented 1 year ago

Does the number in collecting 1437 items stops at 1437? It should not stuck there and it will collet around 2616 items and will start the tests. Although it takes like 2-3 mins for tests to start.

dchoi127 commented 1 year ago

Does the number in collecting 1437 items stops at 1437? It should not stuck there and it will collet around 2616 items and will start the tests. Although it takes like 2-3 mins for tests to start.

Yes it just stops at 1437 and hangs there indefinitely.

samarth9008 commented 1 year ago

Try pulling the latest master or build your environment again as this is not reflected on my side.

dchoi127 commented 1 year ago

I working inside the branch related to the PR in #401. Does that have any effect?

samarth9008 commented 1 year ago

Probably. Try on a new branch.

dchoi127 commented 1 year ago

Tried it on the master branch which is up to date and rebuilt the environment. Testing finally started but hangs on the second test. Waited for about 10 minutes is this the expected behavior?

(amp.client_venv) (base) davidchoi@wireless-10-104-183-87 sorrentum1 % i run_fast_tests
INFO: > cmd='/Users/davidchoi/src/venv/amp.client_venv/bin/invoke run_fast_tests'
## run_fast_tests: 
16:24:48 - INFO  lib_tasks_pytest.py _run_test_cmd:219                  cmd=IMAGE=sorrentum/cmamp:dev \
NETWORK_MODE=bridge \
        docker-compose \
        --file /Users/davidchoi/src/sorrentum1/devops/compose/docker-compose.yml \
        --env-file devops/env/default.env \
        run \
        --rm \
        --name davidchoi.cmamp.app.sorrentum1.20230711_162448 \
        --user $(id -u):$(id -g) \
        app \
        'pytest -m "not slow and not superslow" . -o timeout_func_only=true --timeout 5 --reruns 2 --only-rerun "Failed: Timeout"' 
16:24:48 - INFO  lib_tasks_docker.py _docker_cmd:1252                   Pulling the latest version of Docker
## docker_pull: 
## docker_login: 
  ... 
  ... The config profile (ck) could not be found
16:24:48 - INFO  lib_tasks_docker.py _docker_pull:226                   image='sorrentum/cmamp:dev'
docker pull sorrentum/cmamp:dev
dev: Pulling from sorrentum/cmamp
Digest: sha256:7d9ee52407e426c8d0c6611bebb3a5e76bf05d504122aaa50bf6765dc500a2f7
Status: Image is up to date for sorrentum/cmamp:dev
docker.io/sorrentum/cmamp:dev
IMAGE=sorrentum/cmamp:dev \
NETWORK_MODE=bridge \
        docker-compose \
        --file /Users/davidchoi/src/sorrentum1/devops/compose/docker-compose.yml \
        --env-file devops/env/default.env \
        run \
        --rm \
        --name davidchoi.cmamp.app.sorrentum1.20230711_162448 \
        --user $(id -u):$(id -g) \
        app \
        'pytest -m "not slow and not superslow" . -o timeout_func_only=true --timeout 5 --reruns 2 --only-rerun "Failed: Timeout"' 
WARNING: The AM_AWS_ACCESS_KEY_ID variable is not set. Defaulting to a blank string.
WARNING: The AM_AWS_DEFAULT_REGION variable is not set. Defaulting to a blank string.
WARNING: The AM_AWS_SECRET_ACCESS_KEY variable is not set. Defaulting to a blank string.
WARNING: The AM_FORCE_TEST_FAIL variable is not set. Defaulting to a blank string.
WARNING: The AM_TELEGRAM_TOKEN variable is not set. Defaulting to a blank string.
WARNING: The CK_AWS_ACCESS_KEY_ID variable is not set. Defaulting to a blank string.
WARNING: The CK_AWS_DEFAULT_REGION variable is not set. Defaulting to a blank string.
WARNING: The CK_AWS_SECRET_ACCESS_KEY variable is not set. Defaulting to a blank string.
WARNING: The CK_TELEGRAM_TOKEN variable is not set. Defaulting to a blank string.
Creating compose_app_run ... done
##> devops/docker_run/entrypoint.sh
UID=501
GID=20
# Activate environment
##> devops/docker_run/setenv.sh
# Set PATH
PATH=/app/documentation/scripts:/app/dev_scripts/testing:/app/dev_scripts/notebooks:/app/dev_scripts/install:/app/dev_scripts/infra:/app/dev_scripts/git:/app/dev_scripts/aws:/app/dev_scripts:/app:.:/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# Set PYTHONPATH
PYTHONPATH=/app:
# Configure env
git --version: git version 2.25.1
/app
WARNING: AWS credential check failed: can't find /home/.aws/credentials file.
WARNING: AWS credential check failed: can't find /home/.aws/config file.
# Check AWS authentication setup
      Name                    Value             Type    Location
      ----                    -----             ----    --------
   profile                       am           manual    --profile

The config profile (am) could not be found
AM_CONTAINER_VERSION='1.4.0'
which python: /venv/bin/python
python -V: Python 3.8.10
helpers: <module 'helpers' from '/app/helpers/__init__.py'>
PATH=/app/documentation/scripts:/app/dev_scripts/testing:/app/dev_scripts/notebooks:/app/dev_scripts/install:/app/dev_scripts/infra:/app/dev_scripts/git:/app/dev_scripts/aws:/app/dev_scripts:/app:.:/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH=/app:
entrypoint.sh: 'pytest -m "not slow and not superslow" . -o timeout_func_only=true --timeout 5 --reruns 2 --only-rerun "Failed: Timeout"'
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)
INFO: > cmd='/venv/bin/invoke print_env'
-----------------------------------------------------------------------------
This code is not in sync with the container:
code_version='1.4.3' != container_version='1.4.0'
-----------------------------------------------------------------------------
You need to:
- merge origin/master into your branch with `invoke git_merge_master`
- pull the latest container with `invoke docker_pull`
# Repo config:
  # repo_config.config
    enable_privileged_mode='False'
    get_docker_base_image_name='cmamp'
    get_docker_shared_group=''
    get_docker_user=''
    get_host_name='github.com'
    get_html_dir_to_url_mapping='{'s3://cryptokaizen-html': 'http://172.30.2.44'}'
    get_invalid_words='[]'
    get_name='//sorr'
    get_repo_map='{'sorr': 'sorrentum/sorrentum'}'
    get_shared_data_dirs='None'
    has_dind_support='False'
    has_docker_sudo='True'
    is_CK_S3_available='True'
    run_docker_as_root='False'
    skip_submodules_test='False'
    use_docker_db_container_name_to_connect='True'
    use_docker_network_mode_host='False'
    use_docker_sibling_containers='True'
    # hserver.config
      is_AM_S3_available()='True'
      is_dev4()='False'
      is_dev_ck()='False'
      is_inside_ci()='False'
      is_inside_docker()='True'
      is_mac(version='Catalina')='False'
      is_mac(version='Monterey')='False'
      is_mac(version='Ventura')='True'
# System signature:
  # Git
    branch_name='master'
    hash='f18adfb30'
    # Last commits:
      * f18adfb30 GP Saggese Delete UMD_CS_Phonebook_Graduate_Students.csv                     (  24 hours ago) Mon Jul 10 19:57:27 2023  (HEAD -> master, origin/master, origin/SorrTask409_Unit_test_plot_timeseries_distribution, origin/HEAD)
      * c7cb051c1 Yiyun Lei SorrTask396 Download info for students (#400)                     (  32 hours ago) Mon Jul 10 11:56:11 2023           
      * 20162d520 Yiyun Lei gitignore (#404)                                                  (  33 hours ago) Mon Jul 10 11:53:02 2023           
  # Machine info
    system=Linux
    node name=8d2140fb4311
    release=5.10.76-linuxkit
    version=#1 SMP PREEMPT Mon Nov 8 11:22:26 UTC 2021
    machine=x86_64
    processor=x86_64
    cpu count=4
    cpu freq=None
    memory=svmem(total=2085294080, available=763457536, percent=63.4, used=1130868736, free=325763072, active=596193280, inactive=1001213952, buffers=37384192, cached=591278080, shared=100589568, slab=115052544)
    disk usage=sdiskusage(total=62725623808, used=28147331072, free=31361576960, percent=47.3)
  # Packages
    python: 3.8.10
    cvxopt: 1.3.0
    cvxpy: 1.2.2
    gluonnlp: ?
    gluonts: 0.6.7
    joblib: 1.2.0
    mxnet: 1.9.1
    numpy: 1.23.4
    pandas: 1.5.1
    pyarrow: 10.0.0
    scipy: 1.9.3
    seaborn: 0.12.1
    sklearn: 1.1.3
    statsmodels: 0.13.5
# Env vars:
  AM_AWS_ACCESS_KEY_ID=undef
  AM_AWS_DEFAULT_REGION=undef
  AM_AWS_PROFILE='am'
  AM_AWS_S3_BUCKET='alphamatic-data'
  AM_AWS_SECRET_ACCESS_KEY=undef
  AM_ECR_BASE_PATH='665840871993.dkr.ecr.us-east-1.amazonaws.com'
  AM_ENABLE_DIND='0'
  AM_FORCE_TEST_FAIL=''
  AM_HOST_NAME='wireless-10-104-183-87.umd.edu'
  AM_HOST_OS_NAME='Darwin'
  AM_HOST_USER_NAME='davidchoi'
  AM_HOST_VERSION='22.1.0'
  AM_REPO_CONFIG_CHECK='True'
  AM_REPO_CONFIG_PATH=''
  AM_TELEGRAM_TOKEN=empty
  CI=''
  CK_AWS_ACCESS_KEY_ID=empty
  CK_AWS_DEFAULT_REGION=''
  CK_AWS_S3_BUCKET='cryptokaizen-data'
  CK_AWS_SECRET_ACCESS_KEY=empty
  CK_ECR_BASE_PATH='sorrentum'
  GH_ACTION_ACCESS_TOKEN=empty

16:25:14 - INFO  hcache.py clear_global_cache:292                       Before clear_global_cache: 'global mem' cache: path='/mnt/tmpfs/tmp.cache.mem', size=32.0 KB
16:25:14 - WARN  hcache.py clear_global_cache:293                       Resetting 'global mem' cache '/mnt/tmpfs/tmp.cache.mem'
16:25:14 - WARN  hcache.py clear_global_cache:303                       Destroying '/mnt/tmpfs/tmp.cache.mem' ...
16:25:14 - INFO  hcache.py clear_global_cache:319                       After clear_global_cache: 'global mem' cache: path='/mnt/tmpfs/tmp.cache.mem', size=nan
====================================================================== test session starts =======================================================================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /venv/bin/python
cachedir: .pytest_cache
rootdir: /app, configfile: pytest.ini
plugins: rerunfailures-10.2, cov-4.0.0, xdist-3.0.2, anyio-3.6.2, instafail-0.4.2, timeout-2.1.0
timeout: 5.0s
timeout method: signal
timeout func_only: True
collecting 2598 items                                                                                                                                            -----------------------------------------------------------------------------
This code is not in sync with the container:
code_version='1.4.3' != container_version='1.4.0'
-----------------------------------------------------------------------------
You need to:
- merge origin/master into your branch with `invoke git_merge_master`
- pull the latest container with `invoke docker_pull`
# Git
  branch_name='master'
  hash='f18adfb30'
  # Last commits:
    * f18adfb30 GP Saggese Delete UMD_CS_Phonebook_Graduate_Students.csv                     (  25 hours ago) Mon Jul 10 19:57:27 2023  (HEAD -> master, origin/master, origin/SorrTask409_Unit_test_plot_timeseries_distribution, origin/HEAD)
    * c7cb051c1 Yiyun Lei SorrTask396 Download info for students (#400)                     (  33 hours ago) Mon Jul 10 11:56:11 2023           
    * 20162d520 Yiyun Lei gitignore (#404)                                                  (  33 hours ago) Mon Jul 10 11:53:02 2023           
# Machine info
  system=Linux
  node name=8d2140fb4311
  release=5.10.76-linuxkit
  version=#1 SMP PREEMPT Mon Nov 8 11:22:26 UTC 2021
  machine=x86_64
  processor=x86_64
  cpu count=4
  cpu freq=None
  memory=svmem(total=2085294080, available=673005568, percent=67.7, used=1221328896, free=206532608, active=646692864, inactive=1062277120, buffers=37646336, cached=619786240, shared=100589568, slab=122363904)
  disk usage=sdiskusage(total=62725623808, used=28147384320, free=31361523712, percent=47.3)
# Packages
  python: 3.8.10
  cvxopt: 1.3.0
  cvxpy: 1.2.2
  gluonnlp: ?
  gluonts: 0.6.7
  joblib: 1.2.0
  mxnet: 1.9.1
  numpy: 1.23.4
  pandas: 1.5.1
  pyarrow: 10.0.0
  scipy: 1.9.3
  seaborn: 0.12.1
  sklearn: 1.1.3
  statsmodels: 0.13.5
INFO: > cmd='/venv/bin/pytest -m not slow and not superslow . -o timeout_func_only=true --timeout 5 --reruns 2 --only-rerun Failed: Timeout'
INFO: Saving log to file 'tmp.pytest.log'
collected 2616 items / 190 deselected / 2426 selected                                                                                                            

core/plotting/test/test_plots.py::Test_plots::test_plot_histograms_and_lagged_scatterplot1 PASSED                                                          [  0%]
market_data/test/test_real_time_market_data.py::TestRealTimeMarketData2::test_get_data_at_timestamp1 
samarth9008 commented 1 year ago

Nope. Fast tests usually takes less than 5 secs to execute. Probably something wrong with the system Try to co-ordinate with @Ro0k1e. If it works on his system then you can move forward becz this steps is ony required to recognize the failed tests. Once you know which tests are failing, you can inspect them and add markers accordingly which can be done on anyone's system. Again you can test on @Ro0k1e system once applied the markers.

dchoi127 commented 1 year ago

Nope. Fast tests usually takes less than 5 secs to execute. Probably something wrong with the system Try to co-ordinate with @Ro0k1e. If it works on his system then you can move forward becz this steps is ony required to recognize the failed tests. Once you know which tests are failing, you can inspect them and add markers accordingly which can be done on anyone's system. Again you can test on @Ro0k1e system once applied the markers.

Sounds good.

@Ro0k1e could you run the command i run_fast_tests after setting up the venv. Mine doesn't seem to work. Hopefully things work better on your system. Thanks!

If you have any questions, I'll try to answer them to the best of my knowledge.

Ro0k1e commented 1 year ago

Sorry for the late reply. I was commuting. I will try it now and post my result once I have something.

gpsaggese commented 1 year ago

0) You want to read some doc about pytest we have docs/Unit_tests.md and also some official documentation https://docs.pytest.org/en/7.1.x/contents.html

1) You can run the pytest command directly in docker

> i docker_bash
docker> pytest -m "not slow and not superslow" . -o timeout_func_only=true --timeout 5 --reruns 2 --only-rerun "Failed: Timeout" -s --dbg

to run with more debugging output.

2) You can also run a single test that is hanging

> pytest market_data/test/test_real_time_market_data.py::TestRealTimeMarketData2::test_get_data_at_timestamp1

This test uses imvcddbut.TestImDbHelper which requires Docker-in-docker (aka dind) or sibling-docker. I would mark it as needs_dind, together with all the unit tests that use imvcddbut.TestImDbHelper

Ro0k1e commented 1 year ago

I activated the thin environment and pulled the latest dev_tools image. Then I ran the i run_fast_tests. Mine stopped at 1468 items. Trying the i docker_bash stuff now.

gpsaggese commented 1 year ago

1) What do you mean "Mine stopped at 1468 items" Can you report the output when you say?

2) Note that we are fixing a problem with Docker that makes everything slow when running on Arm, since we have only x86 images, so we go through the emulator.

<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)

What processor are you using?

3) On my M2 Mac I get

collecting 1354 items / 1 skipped / 1353 selected
collecting 1424 items / 1 skipped / 1423 selected

here it hangs, then if I CTRL-c it starts again

collecting 1449 items / 1 error / 1 skipped / 1447 selected
collecting 1454 items / 1 error / 1 skipped / 1452 selected
...
collecting 2485 items / 1 error / 1 skipped / 2483 selected

IMO there is some test that locks up.

gpsaggese commented 1 year ago

I've run on one of our x86 servers and pytest started immediately

collected 2616 items / 190 deselected / 2426 selected

@Ro0k1e and @dchoi127 are you both using Arm-based Mac? If so, I think the problem is due to running the docker image through the emulator

gpsaggese commented 1 year ago

Another piece of info: if I run my Mac in the VPC it doesn't lock up. I think some test is trying to reach some service in our VPC that is not available and so it gets stuck.

I would: 1) fix the Docker issue so we can run fast 2) run the tests by directory so that we can "bisect" where the problem is coming from, e.g., pytest <dir>

Ro0k1e commented 1 year ago

I apologize for the confusion. I will remember to paste the code and the result in the future. The following is what I had yesterday, and I am using a M1 Mac.

INFO: > cmd='/Users/ywang/src/venv/amp.client_venv/bin/invoke run_fast_tests'
## run_fast_tests: 
00:04:29 - INFO  lib_tasks_pytest.py _run_test_cmd:219                  cmd=IMAGE=sorrentum/cmamp:dev \
NETWORK_MODE=bridge \
        docker-compose \
        --file /Users/ywang/src/sorrentum1/devops/compose/docker-compose.yml \
        --env-file devops/env/default.env \
        run \
        --rm \
        --name ywang.cmamp.app.sorrentum1.20230711_210429 \
        --user $(id -u):$(id -g) \
        app \
        'pytest -m "not slow and not superslow" . -o timeout_func_only=true --timeout 5 --reruns 2 --only-rerun "Failed: Timeout"' 
00:04:29 - INFO  lib_tasks_docker.py _docker_cmd:1252                   Pulling the latest version of Docker
## docker_pull: 
## docker_login: 
  ... 
  ... The config profile (ck) could not be found
00:04:29 - INFO  lib_tasks_docker.py _docker_pull:226                   image='sorrentum/cmamp:dev'
docker pull sorrentum/cmamp:dev
dev: Pulling from sorrentum/cmamp
Digest: sha256:7d9ee52407e426c8d0c6611bebb3a5e76bf05d504122aaa50bf6765dc500a2f7
Status: Image is up to date for sorrentum/cmamp:dev
docker.io/sorrentum/cmamp:dev
IMAGE=sorrentum/cmamp:dev \
NETWORK_MODE=bridge \
        docker-compose \
        --file /Users/ywang/src/sorrentum1/devops/compose/docker-compose.yml \
        --env-file devops/env/default.env \
        run \
        --rm \
        --name ywang.cmamp.app.sorrentum1.20230711_210429 \
        --user $(id -u):$(id -g) \
        app \
        'pytest -m "not slow and not superslow" . -o timeout_func_only=true --timeout 5 --reruns 2 --only-rerun "Failed: Timeout"' 
WARNING: The AM_AWS_ACCESS_KEY_ID variable is not set. Defaulting to a blank string.
WARNING: The AM_AWS_DEFAULT_REGION variable is not set. Defaulting to a blank string.
WARNING: The AM_AWS_SECRET_ACCESS_KEY variable is not set. Defaulting to a blank string.
WARNING: The AM_FORCE_TEST_FAIL variable is not set. Defaulting to a blank string.
WARNING: The AM_TELEGRAM_TOKEN variable is not set. Defaulting to a blank string.
WARNING: The CK_AWS_ACCESS_KEY_ID variable is not set. Defaulting to a blank string.
WARNING: The CK_AWS_DEFAULT_REGION variable is not set. Defaulting to a blank string.
WARNING: The CK_AWS_SECRET_ACCESS_KEY variable is not set. Defaulting to a blank string.
WARNING: The CK_TELEGRAM_TOKEN variable is not set. Defaulting to a blank string.
WARNING: Found orphan containers (compose-oms_postgres7855-1, compose-oms_postgres9256-1, compose-im_postgres2707-1, compose-oms_postgres3920-1, compose-im_postgres2261-1, compose-im_postgres6826-1, compose-oms_postgres4419-1, compose-oms_postgres7941-1, compose-oms_postgres9964-1, compose-im_postgres198-1, compose-im_postgres6680-1, compose-oms_postgres3151-1, compose-oms_postgres6219-1, compose-oms_postgres3477-1, compose-oms_postgres9709-1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
Creating compose_app_run ... done
##> devops/docker_run/entrypoint.sh
UID=501
GID=20
# Activate environment
##> devops/docker_run/setenv.sh
# Set PATH
PATH=/app/documentation/scripts:/app/dev_scripts/testing:/app/dev_scripts/notebooks:/app/dev_scripts/install:/app/dev_scripts/infra:/app/dev_scripts/git:/app/dev_scripts/aws:/app/dev_scripts:/app:.:/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# Set PYTHONPATH
PYTHONPATH=/app:
# Configure env
git --version: git version 2.25.1
/app
WARNING: AWS credential check failed: can't find /home/.aws/credentials file.
WARNING: AWS credential check failed: can't find /home/.aws/config file.
# Check AWS authentication setup
      Name                    Value             Type    Location
      ----                    -----             ----    --------
   profile                       am           manual    --profile

The config profile (am) could not be found
AM_CONTAINER_VERSION='1.4.0'
which python: /venv/bin/python
python -V: Python 3.8.10
helpers: <module 'helpers' from '/app/helpers/__init__.py'>
PATH=/app/documentation/scripts:/app/dev_scripts/testing:/app/dev_scripts/notebooks:/app/dev_scripts/install:/app/dev_scripts/infra:/app/dev_scripts/git:/app/dev_scripts/aws:/app/dev_scripts:/app:.:/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH=/app:
entrypoint.sh: 'pytest -m "not slow and not superslow" . -o timeout_func_only=true --timeout 5 --reruns 2 --only-rerun "Failed: Timeout"'
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)
INFO: > cmd='/venv/bin/invoke print_env'
-----------------------------------------------------------------------------
This code is not in sync with the container:
code_version='1.4.3' != container_version='1.4.0'
-----------------------------------------------------------------------------
You need to:
- merge origin/master into your branch with `invoke git_merge_master`
- pull the latest container with `invoke docker_pull`
# Repo config:
  # repo_config.config
    enable_privileged_mode='False'
    get_docker_base_image_name='cmamp'
    get_docker_shared_group=''
    get_docker_user=''
    get_host_name='github.com'
    get_html_dir_to_url_mapping='{'s3://cryptokaizen-html': 'http://172.30.2.44'}'
    get_invalid_words='[]'
    get_name='//sorr'
    get_repo_map='{'sorr': 'sorrentum/sorrentum'}'
    get_shared_data_dirs='None'
    has_dind_support='False'
    has_docker_sudo='True'
    is_CK_S3_available='True'
    run_docker_as_root='False'
    skip_submodules_test='False'
    use_docker_db_container_name_to_connect='True'
    use_docker_network_mode_host='False'
    use_docker_sibling_containers='True'
    # hserver.config
      is_AM_S3_available()='True'
      is_dev4()='False'
      is_dev_ck()='False'
      is_inside_ci()='False'
      is_inside_docker()='True'
      is_mac(version='Catalina')='False'
      is_mac(version='Monterey')='False'
      is_mac(version='Ventura')='True'
# System signature:
  # Git
    branch_name='master'
    hash='f18adfb30'
    # Last commits:
      * f18adfb30 GP Saggese Delete UMD_CS_Phonebook_Graduate_Students.csv                     (  32 hours ago) Mon Jul 10 19:57:27 2023  (HEAD -> master, origin/master, origin/SorrTask409_Unit_test_plot_timeseries_distribution, origin/HEAD)
      * c7cb051c1 Yiyun Lei SorrTask396 Download info for students (#400)                     (    2 days ago) Mon Jul 10 11:56:11 2023           
      * 20162d520 Yiyun Lei gitignore (#404)                                                  (    2 days ago) Mon Jul 10 11:53:02 2023           
  # Machine info
    system=Linux
    node name=9c81753e234c
    release=5.15.49-linuxkit-pr
    version=#1 SMP PREEMPT Thu May 25 07:27:39 UTC 2023
    machine=x86_64
    processor=x86_64
    cpu count=4
    cpu freq=None
    memory=svmem(total=8232951808, available=6388469760, percent=22.4, used=1313353728, free=3372355584, active=501915648, inactive=3657728000, buffers=105328640, cached=3441913856, shared=323547136, slab=547713024)
    disk usage=sdiskusage(total=62671097856, used=10935771136, free=48518610944, percent=18.4)
  # Packages
    python: 3.8.10
    cvxopt: 1.3.0
    cvxpy: 1.2.2
    gluonnlp: ?
    gluonts: 0.6.7
    joblib: 1.2.0
    mxnet: 1.9.1
    numpy: 1.23.4
    pandas: 1.5.1
    pyarrow: 10.0.0
    scipy: 1.9.3
    seaborn: 0.12.1
    sklearn: 1.1.3
    statsmodels: 0.13.5
# Env vars:
  AM_AWS_ACCESS_KEY_ID=undef
  AM_AWS_DEFAULT_REGION=undef
  AM_AWS_PROFILE='am'
  AM_AWS_S3_BUCKET='alphamatic-data'
  AM_AWS_SECRET_ACCESS_KEY=undef
  AM_ECR_BASE_PATH='665840871993.dkr.ecr.us-east-1.amazonaws.com'
  AM_ENABLE_DIND='0'
  AM_FORCE_TEST_FAIL=''
  AM_HOST_NAME='Yuans-MacBook-Air.local'
  AM_HOST_OS_NAME='Darwin'
  AM_HOST_USER_NAME='ywang'
  AM_HOST_VERSION='22.5.0'
  AM_REPO_CONFIG_CHECK='True'
  AM_REPO_CONFIG_PATH=''
  AM_TELEGRAM_TOKEN=empty
  CI=''
  CK_AWS_ACCESS_KEY_ID=empty
  CK_AWS_DEFAULT_REGION=''
  CK_AWS_S3_BUCKET='cryptokaizen-data'
  CK_AWS_SECRET_ACCESS_KEY=empty
  CK_ECR_BASE_PATH='sorrentum'
  GH_ACTION_ACCESS_TOKEN=empty

00:04:54 - INFO  hcache.py clear_global_cache:292                       Before clear_global_cache: 'global mem' cache: path='/mnt/tmpfs/tmp.cache.mem', size=32.0 KB
00:04:54 - WARN  hcache.py clear_global_cache:293                       Resetting 'global mem' cache '/mnt/tmpfs/tmp.cache.mem'
00:04:54 - WARN  hcache.py clear_global_cache:303                       Destroying '/mnt/tmpfs/tmp.cache.mem' ...
00:04:54 - INFO  hcache.py clear_global_cache:319                       After clear_global_cache: 'global mem' cache: path='/mnt/tmpfs/tmp.cache.mem', size=nan
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /venv/bin/python
cachedir: .pytest_cache
rootdir: /app, configfile: pytest.ini
plugins: xdist-3.0.2, instafail-0.4.2, rerunfailures-10.2, timeout-2.1.0, cov-4.0.0, anyio-3.6.2
timeout: 5.0s
timeout method: signal
timeout func_only: True
collecting 1005 items                                                          
dchoi127 commented 1 year ago

I am using a M1 mac processor as well.

dchoi127 commented 1 year ago

Another piece of info: if I run my Mac in the VPC it doesn't lock up. I think some test is trying to reach some service in our VPC that is not available and so it gets stuck.

I would:

  1. fix the Docker issue so we can run fast
  2. run the tests by directory so that we can "bisect" where the problem is coming from, e.g., pytest <dir>

I think I'm a bit confused on how to proceed. I would greatly appreciate it if you could point me in the right direction. I'm unsure where to start with fixing the Docker issue. Could you point me to the code?

Thanks!

Ro0k1e commented 1 year ago
  • Merge master in the branch
  • Run fast and slow tests. You can see the failures
  • Overall goal is to skip the tests which are failing by appropriate markers. Basically the pass tests should be 100% with the required tests being skipped
  • All tests require docker so no need for that marker
  • Take a look at the failures. Mostly the tests which fails will require AWS authentication.
  • Figure out the "reasons" why chunks of tests fail (e.g., AWS_AM, AWS_CK, ...)
  • Create markers for each of them
  • We can tweak i pytest to run only the tests that should pass on Sorrentum (We will do this in the end. First focus on passing all the tests)
  • Feel free to ask any questions you have here.

FYI @gpsaggese I edited the comment bcz interns do not have access to cmamp

Hi! I reran i run_fast_tests this morning on my M1 Mac and this time it gave me the result. Still, I am a bit confused about how we should create markers for the files that failed the tests. Could you clarify what a marker is?

Thank you!

dchoi127 commented 1 year ago
  • Merge master in the branch
  • Run fast and slow tests. You can see the failures
  • Overall goal is to skip the tests which are failing by appropriate markers. Basically the pass tests should be 100% with the required tests being skipped
  • All tests require docker so no need for that marker
  • Take a look at the failures. Mostly the tests which fails will require AWS authentication.
  • Figure out the "reasons" why chunks of tests fail (e.g., AWS_AM, AWS_CK, ...)
  • Create markers for each of them
  • We can tweak i pytest to run only the tests that should pass on Sorrentum (We will do this in the end. First focus on passing all the tests)
  • Feel free to ask any questions you have here.

FYI @gpsaggese I edited the comment bcz interns do not have access to cmamp

Hi! I reran i run_fast_tests this morning on my M1 Mac and this time it gave me the result. Still, I am a bit confused about how we should create markers for the files that failed the tests. Could you clarify what a marker is?

Thank you!

Here is a reference to pytest markers. Hope this helps!

https://docs.pytest.org/en/7.1.x/example/markers.html

Additionally, if you browse the files in the PR #401, there are custom pytest markers there.

gpsaggese commented 1 year ago

1) @samarth9008 is fixing the Docker issue, no worries.

2) If you can run the entire i pytest great and you can upload the output. Something like:

> i run_fast_tests 2>&1 | tee out.log

then you upload the file to see which tests failed

3) Learn how to use pytest

4) Then you need to understand why the test failed. Typically from the error is pretty clear. As to markers, you can look at the PR that Alejandro started.

You need add markers like:

@pytest.mark.requires_docker
class TestModelEvaluator1(hunitest.TestCase):

Then you need to add to pytest.ini

  requires_docker: tests that can only be expected to succeed when running in docker container

There are several reasons (e.g., on top of my head, need Docker-in-docker, need AWS_S3_AM, AWS_S3_CK, ...)

dchoi127 commented 1 year ago
  • Merge master in the branch
  • Run fast and slow tests. You can see the failures
  • Overall goal is to skip the tests which are failing by appropriate markers. Basically the pass tests should be 100% with the required tests being skipped
  • All tests require docker so no need for that marker
  • Take a look at the failures. Mostly the tests which fails will require AWS authentication.
  • Figure out the "reasons" why chunks of tests fail (e.g., AWS_AM, AWS_CK, ...)
  • Create markers for each of them
  • We can tweak i pytest to run only the tests that should pass on Sorrentum (We will do this in the end. First focus on passing all the tests)
  • Feel free to ask any questions you have here.

FYI @gpsaggese I edited the comment bcz interns do not have access to cmamp

Hi! I reran i run_fast_tests this morning on my M1 Mac and this time it gave me the result. Still, I am a bit confused about how we should create markers for the files that failed the tests. Could you clarify what a marker is?

Thank you!

Hey Yuan. Could send me the file with the test results? (I misread your comment and thought you still couldn't run it)

Also just wondering, did you do anything special to get the command working? Were you inside a container, only inside the venv, etc?

Thanks

Ro0k1e commented 1 year ago
  • Merge master in the branch
  • Run fast and slow tests. You can see the failures
  • Overall goal is to skip the tests which are failing by appropriate markers. Basically the pass tests should be 100% with the required tests being skipped
  • All tests require docker so no need for that marker
  • Take a look at the failures. Mostly the tests which fails will require AWS authentication.
  • Figure out the "reasons" why chunks of tests fail (e.g., AWS_AM, AWS_CK, ...)
  • Create markers for each of them
  • We can tweak i pytest to run only the tests that should pass on Sorrentum (We will do this in the end. First focus on passing all the tests)
  • Feel free to ask any questions you have here.

FYI @gpsaggese I edited the comment bcz interns do not have access to cmamp

Hi! I reran i run_fast_tests this morning on my M1 Mac and this time it gave me the result. Still, I am a bit confused about how we should create markers for the files that failed the tests. Could you clarify what a marker is? Thank you!

Hey Yuan. Could send me the file with the test results? (I misread your comment and thought you still couldn't run it)

Also just wondering, did you do anything special to get the command working? Were you inside a container, only inside the venv, etc?

Thanks

Sorry for the very late reply since I did not check messages and planned to work at night. I did not do anything special. I just opened docker and ran source dev_scripts/setenv_amp.sh and i run_fast_tests in terminal. There is no guarantee that the program will not hang indefinitely but this is the result out.log.

Ro0k1e commented 1 year ago

Hi David! This is an update of my findings, and I wonder if I can double-check with you.

It seems that all the failures are due to

* Failed assertion *
File '/home/.aws/credentials' doesn't exist

, which I believe is a lack of AWS credentials.

However, it is a bit weird since I think the function that failed to pass the test should have been skipped in the first place. For example, the first failure message is

=================================== FAILURES ===================================
________ TestTalosHistoricalPqByTileClient2.test_get_end_ts_for_symbol1 ________
Traceback (most recent call last):
  File "/app/im_v2/talos/data/client/test/test_talos_clients.py", line 632, in test_get_end_ts_for_symbol1
    self._test_get_end_ts_for_symbol1(
  File "/app/im_v2/common/data/client/im_client_test_case.py", line 298, in _test_get_end_ts_for_symbol1
    actual_end_ts = im_client.get_end_ts_for_symbol(full_symbol)
  File "/app/im_v2/common/data/client/base_im_clients.py", line 300, in get_end_ts_for_symbol
    return self._get_start_end_ts_for_symbol(full_symbol, mode)
  File "/app/im_v2/common/data/client/base_im_clients.py", line 523, in _get_start_end_ts_for_symbol
    data = self.read_data(
  File "/app/im_v2/common/data/client/base_im_clients.py", line 198, in read_data
    df = self._read_data(
  File "/app/im_v2/common/data/client/base_im_clients.py", line 656, in _read_data
    df = self._read_data_for_multiple_symbols(
  File "/app/im_v2/common/data/client/historical_pq_clients.py", line 173, in _read_data_for_multiple_symbols
    root_dir_df = hparque.from_parquet(root_dir, **kwargs)
  File "/app/helpers/hparquet.py", line 115, in from_parquet
    filesystem = get_pyarrow_s3fs(aws_profile)
  File "/app/helpers/hparquet.py", line 48, in get_pyarrow_s3fs
    aws_credentials = hs3.get_aws_credentials(*args, **kwargs)
  File "/app/helpers/hs3.py", line 696, in get_aws_credentials
    config = _get_aws_config(file_name)
  File "/app/helpers/hs3.py", line 493, in _get_aws_config
    hdbg.dassert_file_exists(file_name)
  File "/app/helpers/hdbg.py", line 762, in dassert_file_exists
    _dfatal(txt, msg, *args, only_warning=only_warning)
  File "/app/helpers/hdbg.py", line 142, in _dfatal
    dfatal(dfatal_txt)
  File "/app/helpers/hdbg.py", line 71, in dfatal
    raise assertion_type(ret)
AssertionError: 
################################################################################
* Failed assertion *
File '/home/.aws/credentials' doesn't exist

As I checked the code, there is a pytest marker before the declaration of the function:

@pytest.mark.skipif(
    not henv.execute_repo_config_code("is_CK_S3_available()"),
    reason="Run only if CK S3 is available",
)

So I guess there is some problem with the function is_CK_S3_available(). Although I am still working on it, would you mind taking a look at it as well?

Thanks!

samarth9008 commented 1 year ago

Let us know by Wednesday how you guys are progressing with the issue. If you find it difficult and complex, no worries. We will assign you guys to a different issue. We understand its a little complex and complicated.

dchoi127 commented 1 year ago

@Ro0k1e

There seems to be a couple failures apart from AWS credentials. After looking through the whole out file it seems the issues are

RuntimeError: cmd='(docker container ls --filter name=/compose-oms_postgres3221-1 -aq) 2>&1' failed with rc='1'
truncated output=
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json?all=1&filters=%7B%22name%22%3A%7B%22%2Fcompose-oms_postgres3221-1%22%3Atrue%7D%7D": dial unix /var/run/docker.sock: connect: permission denied

Thus a docker issue (might be because we're not running the command inside a container)

* Failed assertion *
File '/home/.aws/credentials' doesn't exist

As you mentioned, AWS credentials missing.

Failed: Timeout >5.0s

Tests running for too long and thus timing out.

Haven't looked through the files yet. Doing that now.

dchoi127 commented 1 year ago

Currently looking at the first failed test case which is oms/test/test_restrictions.py::TestRestrictions1::test2.

However when looking at the logs of the output file it appears that this test is ran twice with the first run passing and the second failing with an error.

oms/test/test_restrictions.py::TestRestrictions1::test2 PASSED           [  1%]
oms/test/test_restrictions.py::TestRestrictions1::test2 ERROR            [  1%]

However, it appears that the command we are executing only reruns tests that fail based on Failed:Timeout. Can anyone tell me as to why this command is running twice?

Also, as this test does run successfully, is this a test I should skip or something I should mark as well. I'm not sure how to mark this test because it seems that it does pass.

Below is the full error message corresponding to the test

_________________ ERROR at teardown of TestRestrictions1.test2 _________________
Traceback (most recent call last):
  File "/app/helpers/hsql_test.py", line 124, in tearDownClass
    hdocker.container_rm(container_name)
  File "/app/helpers/hdocker.py", line 21, in container_rm
    _, container_id = hsystem.system_to_one_line(cmd)
  File "/app/helpers/hsystem.py", line 401, in system_to_one_line
    rc, output = system_to_string(cmd, *args, **kwargs)
  File "/app/helpers/hsystem.py", line 344, in system_to_string
    rc, output = _system(
  File "/app/helpers/hsystem.py", line 277, in _system
    raise RuntimeError(
RuntimeError: cmd='(docker container ls --filter name=/compose-oms_postgres7016-1 -aq) 2>&1' failed with rc='1'
truncated output=
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json?all=1&filters=%7B%22name%22%3A%7B%22%2Fcompose-oms_postgres7016-1%22%3Atrue%7D%7D": dial unix /var/run/docker.sock: connect: permission denied
dchoi127 commented 1 year ago

Additionally, after reviewing all tests that fail with

RuntimeError: cmd='(docker container ls --filter name=/compose-im_postgres2555-1 -aq) 2>&1' failed with rc='1'
truncated output=
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json?all=1&filters=%7B%22name%22%3A%7B%22%2Fcompose-im_postgres2555-1%22%3Atrue%7D%7D": dial unix /var/run/docker.sock: connect: permission denied

They all follow the same trend of where they run twice. The first run passes and the second fails on error.

Ro0k1e commented 1 year ago

@Ro0k1e

There seems to be a couple failures apart from AWS credentials. After looking through the whole out file it seems the issues are

RuntimeError: cmd='(docker container ls --filter name=/compose-oms_postgres3221-1 -aq) 2>&1' failed with rc='1'
truncated output=
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json?all=1&filters=%7B%22name%22%3A%7B%22%2Fcompose-oms_postgres3221-1%22%3Atrue%7D%7D": dial unix /var/run/docker.sock: connect: permission denied

Thus a docker issue (might be because we're not running the command inside a container)

* Failed assertion *
File '/home/.aws/credentials' doesn't exist

As you mentioned, AWS credentials missing.

Failed: Timeout >5.0s

Tests running for too long and thus timing out.

Haven't looked through the files yet. Doing that now.

Ah, my mistake. I only focused on failure messages and ignore errors. Thank you!

dchoi127 commented 1 year ago

@Ro0k1e There seems to be a couple failures apart from AWS credentials. After looking through the whole out file it seems the issues are

RuntimeError: cmd='(docker container ls --filter name=/compose-oms_postgres3221-1 -aq) 2>&1' failed with rc='1'
truncated output=
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json?all=1&filters=%7B%22name%22%3A%7B%22%2Fcompose-oms_postgres3221-1%22%3Atrue%7D%7D": dial unix /var/run/docker.sock: connect: permission denied

Thus a docker issue (might be because we're not running the command inside a container)

* Failed assertion *
File '/home/.aws/credentials' doesn't exist

As you mentioned, AWS credentials missing.

Failed: Timeout >5.0s

Tests running for too long and thus timing out. Haven't looked through the files yet. Doing that now.

Ah, my mistake. I only focused on failure messages and ignore errors. Thank you!

That's a good point. @samarth9008 should we overlook errors and only focus on failures?

gpsaggese commented 1 year ago

Guys I think this is a tricky one since it requires debugging and so on. I know you can complete the task, but it's going to be a lot of churn. Feel free to push whatever you have in a PR and I'll take care of this.

@samarth9008 and @DanilYachmenev can you pls assign something more coding-related to our 2 heroes?

DanilYachmenev commented 12 months ago

@LibertasSpZ for now go over the discussion to get familiar with the issue and share a brief of your understanding afterwards @gpsaggese will instruct you more on the next steps

gpsaggese commented 11 months ago

@LibertasSpZ feel free to start reading this issue (several people attempted, but it was a bit tricky) and try your hand on it. If you get stuck you can pick a time on my calendar for the PP https://calendly.com/gsaggese/30mins-et-afternoon?month=2023-07

If you need bugs in the meantime just ask @DanilYachmenev for more workload

LibertasSpZ commented 11 months ago

Thanks for the assignment, @gpsaggese and @DanilYachmenev . I run i run_fast_tests > output.txt 2>&1 on the new branch, and indeed found

(1) The * Failed assertion * File '/home/.aws/credentials' doesn't exist issue that @dchoi127 and @Ro0k1e mentioned (thank you for identifying these issues btw) and

(2) Some tests PASS in the first run and FAIL in the rerun, as @dchoi127 noted. But not all failed tests follow this pattern, for example on my end this one seems to fail in the first run: im_v2/ccxt/data/client/test/test_ccxt_clients.py::TestCcxtHistoricalPqByTileClient1::test_read_data1 (0.01 s) FAILED [ 24%]

Am I correct in understand that the task is to inspect into the failed and error-ed tests, figure out the cause of the behavior above, and cure if possible?

gpsaggese commented 11 months ago

The problem is that the tests pass on our server, but they fail on people's laptop since they are not inside the VPN and / or don't have certain resources available.

We want to mark the tests that fail with the reason they fail (as per https://github.com/sorrentum/sorrentum/issues/283#issuecomment-1633173271) so that we can get to the point where the failing tests are skipped, and one

The first step is to post the entire output to see which tests are failing. Then we can do a quick PP session and I can show you what to do. Pick a time here https://calendly.com/gsaggese/30mins-et-afternoon?month=2023-07

LibertasSpZ commented 11 months ago

Thanks for the clarification, @gpsaggese . For the entire output please see the attachment out_log.txt .

I will learn more about marking the tests from the linked comment and PR 401.

I will make the appointment in a bit.

gpsaggese commented 11 months ago

@LibertasSpZ

I've put some notes here https://docs.google.com/document/d/1Qm54LwNRlBwzYroDGviW32NevK_V8L4Sufsg5u6IKwI/edit#heading=h.6bhp1lld1a4v

We want to go in order: 1) find classes of failures and think about solutions 2) add pytest markers 3) re-run pytest and see what's left

DanilYachmenev commented 11 months ago

@LibertasSpZ thx for the log file!

So let's process it a bit and make a more readable report for GP After a brief look I've found 4 types of errors that almost all the failing tests can be separated at:

1) Docker issue log:

RuntimeError: cmd='(docker container ls --filter name=/compose-oms_postgres7384-1 -aq) 2>&1' failed with rc='1'
truncated output=
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json?all=1&filters=%7B%22name%22%3A%7B%22%2Fcompose-oms_postgres7384-1%22%3Atrue%7D%7D": dial unix /var/run/docker.sock: connect: permission denied

probably just need to add @pytest.mark.requires_docker on top as suggested in https://github.com/sorrentum/sorrentum/issues/283#issuecomment-1633173271 but need to clarify with @gpsaggese

2) Missing aws credentials log:

AssertionError: 
################################################################################
* Failed assertion *
File '/home/.aws/credentials' doesn't exist
################################################################################

same as p.1

3) Timeout log

Failed: Timeout >5.0s

these ones are potentially really easy to fix - just changing the marker on top of it to make test slow however, would be good to understand why som many tests dropped out of a time limit

4) Changes in goldens log is smth that demonstrates an actual txt output on the left and the diverting expected lines on the right

we need to figure out what are the changes there, are they expected and if ok, just run them with --update_outcomes

In any case, could you pls process all the failures and as the first step just provide the lists of tests for each of these error types (and also add anther group if it exists) Overall it does seem big but errors are mostly the same so could be much faster than it looks like

gpsaggese commented 11 months ago

@DanilYachmenev good to see that you read my mind. Let's work on the gdoc together

gpsaggese commented 11 months ago

Let's skateboard @DanilYachmenev: let's start disabling all the failing tests by adding a marker requires_cmamp and we can enable the tests as https://github.com/sorrentum/sorrentum/issues/282

We can file bugs for each "class of issues" and distribute them as outsourceable.

DanilYachmenev commented 11 months ago

Let's skateboard @DanilYachmenev: let's start disabling all the failing tests by adding a marker requires_cmamp and we can enable the tests as #282

We can file bugs for each "class of issues" and distribute them as outsourceable.

to clarify the 1st step

gpsaggese commented 11 months ago

Close. Let's mark all the tests that are failing in Sorrentum with pytest.mark.requires_cmamp and then we force these tests to be skipped in the invoke (I can work on this with @LibertasSpZ, I think we have a PP session today)

LibertasSpZ commented 11 months ago

@gpsaggese I can put the requires_cmamp marks during or after our PP session today. And thank you very much @DanilYachmenev for classifying the 4 common types of failures.

By the way @DanilYachmenev @gpsaggese in PR #473 I fixed a failure which does not belong to any common class. The failing function test_plots.py::test_plot_heatmap1 was:

(1) calling get_plot_heatmap1 by the wrong name get_plot_heatmap, and (2) along the trace, using deprecated alias np.float, which I changed to np.float64 per pytest's request.

If these changes especially (2) are okay, I believe #473 is ready for merge, and we can focus on marking the common types discussed above.

gpsaggese commented 11 months ago

If you have a sec take a look at, where I also indicated solutions of the errors

https://docs.google.com/document/d/1Qm54LwNRlBwzYroDGviW32NevK_V8L4Sufsg5u6IKwI/edit#heading=h.6bhp1lld1a4v

LibertasSpZ commented 11 months ago

Thanks. Took a glance. Maybe for the timeouts [Category 1] we can mark them as slow instead.