pycontribs / jira

Python Jira library. Development chat available on https://matrix.to/#/#pycontribs:matrix.org
https://jira.readthedocs.io
BSD 2-Clause "Simplified" License
1.91k stars 856 forks source link

Replace `pillow` with `filetype` #1867

Open christianarty opened 3 weeks ago

christianarty commented 3 weeks ago

When this PR was merged, https://github.com/pycontribs/jira/pull/1680, we replaced imghdr with Pillow.

Albeit Pillow is a great library, it has many conflicts with different platform architectures, and too much overhead for the use we have for it (We just use it to determine a MIME type for an image)

Looking at PEP 594, filetype is a reccommended replacement for imghdr and it is cross platform compatible.

This PR is replacing Pillow with filetype

christianarty commented 2 weeks ago

To be honest, I am not sure if filetype is much better and here is why:

Apparently pillow had releases pypi.org/project/pillow/#files

Still, a pure python library would have lots of advantages. I would support migration to a pure python alternative.

Hi @ssbarnea,

Thank you for reviewing my PR and providing feedback. I appreciate your feedback and remarks, but I believe they should not deter us from considering filetype over Pillow.

As noted in #1801, this issue is not only for me, but for other consumers as well. For our pycontrib/jira's use case, Pillow is unnecessarily large for simply detecting the mimetype based on buffer bytes. Its complex dependencies, especially on MacOS, often require additional tinkering, which can be cumbersome for the consumers. Generally they have to go and deal with linking binaries across their system so that the C library underneath the hood in Pillow can properly execute.

Regarding the points raised:

hosted under personal github account instead of using an umbrella organization, h2non/filetype.py

The hosting under a personal GitHub account shouldn't be a deterrent. Popular packages like fastapi and six are similarly hosted and widely used.

not much recent activity h2non/filetype.py/graphs/contributors

filetype is a focused utility, unlike Pillow, which has broader image processing functionalities. Its lack of frequent updates doesn't imply instability, but rather can indicate it is stable for its specific purpose.

CI currently broken and had only one PR merged in 2024: h2non/filetype.py/pulls (sort:updated-desc is:closed)

The CI breakage seems recent due to an update in their test infrastructure. This appears to be a temporary issue and doesn't reflect the library's overall stability.

not using pep-517 for packaging

While PEP 517 provides a way toward simpler packaging, the absence of it in filetype shouldn't be a major concern. Ideally, it should migrate to use PEP-517 soon, but this isn't a blocker.

Additionally, This situation is similar to an existing dependency pycontribs/jira uses: defusedxml. It is hosted under a personal account, had its last release in 2021, has low activity, and also doesn't use PEP 517 for packaging. Despite this, defusedxml serves its one purpose well within our project.

ssbarnea commented 2 weeks ago

I added my comments, a slight -1 but I will not block them this change. Maybe that is a good opportunity to review library dependencies and plan to remove or replace some of them (other ticket).

I want to make jira library slim and pure python too!

We can also raise our concerns upstream and see how current maintainers are willing to address them (could help us make a more informed decision).

christianarty commented 2 weeks ago

I added my comments, a slight -1 but I will not block them this change. Maybe that is a good opportunity to review library dependencies and plan to remove or replace some of them (other ticket).

I want to make jira library slim and pure python too!

We can also raise our concerns upstream and see how current maintainers are willing to address them (could help us make a more informed decision).

Yes, that would be ideal! This package should be slim and ideally pure python. I don't think this needs anything computationally expensive (at least right now).

So, @ssbarnea , what would be the necessary action items to get this PR through the door and merged? We can raise some issues upstream and coordinate with their maintainers, but ideally want this PR merged sooner rather than later, since it affects my workflow and a couple others.

ob commented 1 week ago

+1 to removing dependency on Pillow... I've had to wrap this library and do the following:

import sys
class Image:
    """Dummy PIL.Image module."""

    @staticmethod
    def open(file, mode="r"):
        """
        Dummy implementation of PIL.Image.open() that always raises a
        NotImplementedError due to a PIL import problem on macOS
        """
        raise NotImplementedError("This is a no-op implementation of PIL.Image.open()")

# On macOS, patch PIL.Image with the dummy module
if sys.platform == "darwin":
    sys.modules["PIL.Image"] = sys.modules[__name__]

which is gross and I'l like to avoid.

dimitarOnGithub commented 1 week ago

Is removing the auto-detection functionality entirely an option here?

I'm aware that probably a lot of projects out there rely on it to automatically populate the content-type header, so maybe a version or two deprecation warning in place would be the first move, but considering that both create_temp_project_avatar and create_temp_project_avatar offer the user to provide a str value for contentType in their signature, it would be a matter of shifting the responsibility on the end user and their preferred library (that is if they need it in the first place).

The benefit would be reducing the dependencies required by the package and allowing the users to pick their own flavour of a library, the downside is that everyone who wants to upgrade will have to handle this on their own.

ob commented 1 week ago

Removing the functionality would work and simplify the dependencies. As for a transition how requiring content-type (or MIME type or whatever) to be provided and give people a snippet of code that is equivalent to the only usage in the code. That way you can use whatever you want for guessing image types when you don't know...