minio / minio-py

MinIO Client SDK for Python
https://docs.min.io/docs/python-client-quickstart-guide.html
Apache License 2.0
852 stars 325 forks source link

content_type isn't returned by list_objects so examples/list_objects.py incorrectly shows it doing so #1284

Closed sprior closed 1 year ago

sprior commented 1 year ago

Line 31 of examples/list_objects.py shows content_type being printed for objects returned by list_objects, but apparently content_type isn't filled in by that call so the example is misleading - it always prints None.

harshavardhana commented 1 year ago

This is normal list objects do not respond back with content type, we should update the examples and remove it from them.

balamurugana commented 1 year ago

You would need to read API documentation and respective S3 specification.

sprior commented 1 year ago

You would need to read API documentation and respective S3 specification.

You closed this but I don't see the example I listed as incorrect fixed. So are you fixing the example?

balamurugana commented 1 year ago

There is no issue in the example. ListObjects S3 APIs do not return content-type and Object class is generically available for other APIs too. If you are printing list objects output, there are many fields would be empty. You would need to know what APIs you are using to fetch information.

sprior commented 1 year ago

The code in examples/list_objects.py starting at line 27 is as follows:

`objects = client.list_objects('my-bucketname', prefix='my-prefixname',
                              recursive=True)
for obj in objects:
    print(obj.bucket_name, obj.object_name.encode('utf-8'), obj.last_modified,
          obj.etag, obj.size, obj.content_type)`

So the example is calling list_objects, iterating through the returned objects and printing obj.content_type which is always None. I don't understand how you can say that there is no issue in the example. Anyone reading this code is given the impression that content_type is filled in when it is not.

The simple fix would be at least to remove the printing of that field so the reader wouldn't think they should be able to.

balamurugana commented 1 year ago

Feel free to send a PR

sprior commented 1 year ago

I see that it's already fixed in the master branch three years ago but still incorrect in the release branch, so no PR is needed, but can't you get the release branch merged from master so people installing this library with pip will see the correction?

balamurugana commented 1 year ago

release branch is stale and should not be used. We believe in keeping master branch stable all the time. pip install minio pulls the latest release made from master branch. We do not merge or update stale release branch.

sprior commented 1 year ago

I'm seeing that pip install minio did NOT pull the master branch it pulled the release branch. Can you think of why this would be so?

I changed my requirements.txt from: minio to minio @ git+https://github.com/minio/minio-py@master

and that seems to have gotten the master branch, but that's not what people would normally know to do. Also it appears that the objects in minio/datatypes.py do not declare a str(self), so when examples/list_objects.py now do: objects = client.list_objects("my-bucket", prefix="my/prefix/") for obj in objects: print(obj)

the only output is going to be like: <minio.datatypes.Object object at 0x000002649D1A1050>

which seems less useful than the same example in the release branch with the exception that the release branch makes it appear that content_type is available when it is not.

sprior commented 1 year ago

Also, the documentation page at https://min.io/docs/minio/linux/developers/python/minio-py.html in the More References section links to the release (not the master) branch of the corresponding pages which is how I found the examples/list_objects.py file in the first place. So if you are saying that the release branch should no longer be used you should at least point to master instead of release on your docs page.

sprior commented 1 year ago

If you look at https://pypi.org/project/minio/#history you will see that the version of minio that pypi is pointing to is the release version and not master, so I believe anyone using 'pip install minio' is not going to get the version you recommend.

harshavardhana commented 1 year ago

@sprior I think you are just confused https://github.com/minio/minio-py/blob/7.1.15/examples/list_objects.py

harshavardhana commented 1 year ago

Also, the documentation page at https://min.io/docs/minio/linux/developers/python/minio-py.html in the More References section links to the release (not the master) branch of the corresponding pages which is how I found the examples/list_objects.py file in the first place. So if you are saying that the release branch should no longer be used you should at least point to master instead of release on your docs page.

Every single comment here is wrong here @sprior

sprior commented 1 year ago

Go to the link in my last comment: https://min.io/docs/minio/linux/developers/python/minio-py.html And scroll down to the bottom and look at the link in the More References section called examples. The URL is: https://github.com/minio/minio-py/tree/release/examples See the word release in there? Then click on that link and look at the branches pulldown at the top left, see how it says release instead of master??? How are you saying I'm wrong?

harshavardhana commented 1 year ago

Go to the link in my last comment: https://min.io/docs/minio/linux/developers/python/minio-py.html And scroll down to the bottom and look at the link in the More References section called examples. The URL is: https://github.com/minio/minio-py/tree/release/examples See the word release in there? Then click on that link and look at the branches pulldown at the top left, see how it says release instead of master??? How are you saying I'm wrong?

Feel free to send a fix, happy to accept a PR.

sprior commented 1 year ago

Go to the link in my last comment: https://min.io/docs/minio/linux/developers/python/minio-py.html And scroll down to the bottom and look at the link in the More References section called examples. The URL is: https://github.com/minio/minio-py/tree/release/examples See the word release in there? Then click on that link and look at the branches pulldown at the top left, see how it says release instead of master??? How are you saying I'm wrong?

Feel free to send a fix, happy to accept a PR.

Can you recommend where to change this? It seems that the main minio docs page is still pointing to your release branch instead of master. As I pointed out above if you start at the minio docs page at https://min.io/docs/minio/kubernetes/upstream/ then click on SDKs under Developers, then scroll to Python and click on MinIO Python SDK Reference then scroll to the bottom, the Examples link there points to the obsolete release branch instead of the master branch. This is how I got confused because it shows content_type.

I traced it to this file: https://github.com/minio/docs/blob/main/source/developers/python/minio-py.rst

But that doesn't show which branch it links to so I don't know how to change that for a PR.

harshavardhana commented 1 year ago

https://github.com/minio/minio-py#more-references

sprior commented 1 year ago

I created https://github.com/minio/minio-py/pull/1285

sprior commented 1 year ago

So now that it's merged into master, how long before the correction is live on the docs website?

sprior commented 1 year ago

FYI, I just figured out where I was confused above - your releases aren't from the release branch, they're tags from the master branch. So if you clone master you see the code is version 7.1.16 when the released version you get from pip install minio is version 7.1.15 and naturally assume this is from the release branch, but it's not.

paramazo commented 1 year ago

If i try to use list_object i only get a memory reference of the object and nothing helpful. It prints like

<minio.datatypes.Object object at 0x7f79fdde9e50>

I would like to list folders and files in a bucket with the minio API in Python.

    # list all objects in the bucket
    objects = client.list_objects(MLFLOW_BUCKET, prefix="artifacts/")
    for obj in objects:
        print(obj)

Seems i need to use boto3 instead.

// checked the class object in code and found the relevant parts

        self._bucket_name = bucket_name
        self._object_name = object_name
        self._last_modified = last_modified
        self._etag = etag
        self._size = size
        self._metadata = metadata
        self._version_id = version_id
        self._is_latest = is_latest
        self._storage_class = storage_class
        self._owner_id = owner_id
        self._owner_name = owner_name
        self._content_type = content_type
        self._is_delete_marker = is_delete_marker

So i need to use print(obj.object_name)