Failing tests (test_s3 and test_path_s3)

EmanuelaBoros commented 9 months ago

The buckets cannot be retrieved + an import. The import can be solved (it seems that it was renamed) - I will do a PR. However, the fact that return_bucket returns None is unclear from where it comes (I can connect with s3cmd with no issues).

__________________________________________________________ ERROR collecting tests/utils/test_s3.py __________________________________________________________
ImportError while importing test module 'impresso-pycommons/tests/utils/test_s3.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/utils/test_s3.py:10: in <module>
    from impresso_commons.utils.config_loader import TextImporterConfig
E   ImportError: cannot import name 'TextImporterConfig' from 'impresso_commons.utils.config_loader' (impresso_commons/utils/config_loader.py)
===================================================================== warnings summary ======================================================================

================================================================== short test summary info ==================================================================
ERROR tests/test_path_s3.py - AttributeError: 'NoneType' object has no attribute 'name'
ERROR tests/utils/test_s3.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================================== 3 warnings, 2 errors in 28.26s ===============================================================

EmanuelaBoros commented 9 months ago

It seems that it works locally, but not on a RunAI node. I am looking into it.

EmanuelaBoros commented 9 months ago

FAILED tests/test_path_s3.py::test_s3_iter_bucket - AttributeError: 'NoneType' object has no attribute 'name'
FAILED tests/test_path_s3.py::test_s3_filter_archives - AttributeError: 'NoneType' object has no attribute 'name'
FAILED tests/test_path_s3.py::test_s3_filter_archives_timebucket - AttributeError: 'NoneType' object has no attribute 'name'
FAILED tests/utils/test_kube.py::test_dask_cluster - kubernetes.config.config_exception.ConfigException: Invalid kube-config file. No configuration found.
FAILED tests/utils/test_s3.py::test_get_s3_versions - AttributeError: 'NoneType' object has no attribute 'get_all_keys'
FAILED tests/utils/test_s3.py::test_read_jsonlines - AttributeError: 'NoneType' object has no attribute 'name'
FAILED tests/utils/test_s3.py::test_load_config - TypeError: argument of type 'NoneType' is not iterable

EmanuelaBoros commented 9 months ago

I found the issue. I do not have the rights to list all buckets.

def get_bucket(name, create=False, versioning=True):
    """Create a boto s3 connection and returns the requested bucket.

    It is possible to ask for creating a new bucket
    with the specified name (in case it does not exist), and (optionally)
    to turn on the versioning on the newly created bucket.
    >>> b = get_bucket('testb', create=False)
    >>> b = get_bucket('testb', create=True)
    >>> b = get_bucket('testb', create=True, versioning=False)
    :param name: the bucket's name
    :type name: string
    :param create: creates the bucket if not yet existing
    :type create: boolean
    :param versioning: whether the new bucket should be versioned
    :type versioning: boolean
    :return: an s3 bucket
    :rtype: `boto.s3.bucket.Bucket`
    .. TODO:: avoid import both `boto` and `boto3`
    """
    conn = get_s3_connection()
    # try to fetch the specified bucket -- may return an empty list
    bucket = [b for b in conn.get_all_buckets() if b.name == name]

This method assume that one has this right. However, I would propose to change this method to a direct connection to the specified bucket instead of looking the bucket up with get_all_buckets().

This works on my side:

# List the contents of the bucket
response = s3.list_objects(Bucket='rebuilt-data')

for content in response.get('Contents', []):
    print(content['Key'])

I propose to talk about this, in case I am wrong.

impresso / impresso-pycommons

Failing tests (test_s3 and test_path_s3) #75