kdeldycke / mail-deduplicate

📧 CLI to deduplicate mails from mail boxes
GNU General Public License v2.0
166 stars 39 forks source link

Object has no attribute '_subdir' error #191

Closed dschrempf closed 3 years ago

dschrempf commented 3 years ago

After execution of the following command, I get the mentioned error message (see logs):

 + ~/Downloads/Temp/mail-deduplicate/bin/mdedup -s select-oldest -a move-selected -t ctime -E Linux-Dedup/ -e maildir Linux/

● Phase #0 - Load mails

Opening /home/dominik/Maildir/gmail/Linux ...
maildir detected.
1063 mails found.

● Phase #1 - Compute hashes and group duplicates
Use [date, from, to, subject, mime-version, content-type, content-disposition, user-agent, x-priority, message-id] headers to compute hashes.
Hashed mails  [####################################]  1063/1063

● Phase #2 - Select mails in each group
select-oldest strategy will be applied on each duplicate set to select candidates.

● Phase #3 - Perform action on selected mails
Perform move-selected action...
1063 mails selected for action.
Creating new maildir box at /home/dominik/Maildir/gmail/Linux-Dedup ...
Traceback (most recent call last):
  File "/home/dominik/Downloads/Temp/mail-deduplicate/bin/mdedup", line 8, in <module>
  File "/home/dominik/Downloads/Temp/mail-deduplicate/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/dominik/Downloads/Temp/mail-deduplicate/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/dominik/Downloads/Temp/mail-deduplicate/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/dominik/Downloads/Temp/mail-deduplicate/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/dominik/Downloads/Temp/mail-deduplicate/lib/python3.8/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/dominik/Downloads/Temp/mail-deduplicate/lib/python3.8/site-packages/mail_deduplicate/cli.py", line 388, in mdedup
  File "/home/dominik/Downloads/Temp/mail-deduplicate/lib/python3.8/site-packages/mail_deduplicate/action.py", line 114, in perform_action
  File "/home/dominik/Downloads/Temp/mail-deduplicate/lib/python3.8/site-packages/mail_deduplicate/action.py", line 62, in move_selected
  File "/nix/store/wkw6fsjasr7jbbrlakxxpbiapa8hws42-python3-3.8.7/lib/python3.8/mailbox.py", line 300, in add
    subdir = message.get_subdir()
  File "/nix/store/wkw6fsjasr7jbbrlakxxpbiapa8hws42-python3-3.8.7/lib/python3.8/mailbox.py", line 1537, in get_subdir
    return self._subdir
AttributeError: 'MaildirDedupMail' object has no attribute '_subdir'
dschrempf commented 3 years ago

So it turns out this was caused by me executing mdep from outside the virtual environment. Pretty stupid that this can actually be done :).

dschrempf commented 3 years ago

Sorry I have to reopen. This was not my fault. The error does not happen when using -n.

kaz-yos commented 3 years ago

Same as https://github.com/kdeldycke/mail-deduplicate/issues/135?

I got the same error with version 6.1.2.

mdedup 6.1.2
{'username': '-', 'guid': '7d002aa8ff457a7721f6a7ad164505f', 'hostname': '-', 'hostfqdn': '-', 'uname': {'system': 'Darwin', 'node': '-', 'release': '20.3.0', 'version': 'Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64', 'machine': 'x86_64', 'processor': 'i386'}, 'linux_dist_name': '', 'linux_dist_version': '', 'cpu_count': 12, 'fs_encoding': 'utf-8', 'ulimit_soft': 256, 'ulimit_hard': 9223372036854775807, 'cwd': '-', 'umask': '0o2', 'python': {'argv': '-', 'bin': '-', 'version': '3.7.1 (default, Oct 23 2018, 14:07:42) [Clang 4.0.1 (tags/RELEASE_401/final)]', 'compiler': 'Clang 4.0.1 (tags/RELEASE_401/final)', 'build_date': 'Oct 23 2018 14:07:42', 'version_info': [3, 7, 1, 'final', 0], 'features': {'openssl': 'OpenSSL 1.1.1d  10 Sep 2019', 'expat': 'expat_2.2.6', 'sqlite': '3.25.3', 'tkinter': '8.6', 'zlib': '1.2.11', 'unicode_wide': True, 'readline': True, '64bit': True, 'ipv6': True, 'threading': True, 'urandom': True}}, 'time_utc': '2021-02-10 23:31:27.276025', 'time_utc_offset': -5.0, '_eco_version': '1.0.1'}
alisraza commented 3 years ago

I think this is likely the same as #135 as @kaz-yos pointed out.


Running the following command:

"$basedir"/code/forks/mail-deduplicate/.venv/bin/mdedup \
    --input-format maildir \
    --size-threshold 0 \
    --content-threshold 0 \
    --strategy discard-all-but-one \
    --action move-selected \
    --export "$output_path" \
    --export-format maildir \
    --verbosity debug \
    "$mail_source_1" "$mail_source_2"

Yields (truncated output):

● Phase #3 - Perform action on selected mails
Perform move-selected action...
232 mails selected for action.
Creating new maildir box at [$output_path] ...
debug: Locking box...
debug: Move <MaildirDedupMail ["$mail_source_1"]:[NNNNNNNNNN].[NNNNN]_[NNN].["$hostname"],U=[NNN]> form ["$mail_source_1"] to ["$output_path"]...

With stacktrace:

  File "["$basedir"]/code/forks/mail-deduplicate/mail_deduplicate/cli.py", line 388, in mdedup
  File "["$basedir"]/code/forks/mail-deduplicate/mail_deduplicate/action.py", line 114, in perform_action
  File "["$basedir"]/code/forks/mail-deduplicate/mail_deduplicate/action.py", line 62, in move_selected
  File "[~]/.pyenv/versions/3.7.10/lib/python3.7/mailbox.py", line 300, in add
    subdir = message.get_subdir()
  File "[~]/.pyenv/versions/3.7.10/lib/python3.7/mailbox.py", line 1537, in get_subdir
    return self._subdir
AttributeError: 'MaildirDedupMail' object has no attribute '_subdir'

Debugging Information:

Coding is a side-hobby and I haven't looked at python code for a while, but from stepping through the code, my best guess is that when the mail object is created as a subclass, it may be running the __init__ function from the python standard library's Message class rather than the MaildirMessage class, given the __init__ function for the MaildirMessage class is:

class MaildirMessage(Message):
    """Message with Maildir-specific properties."""

    _type_specific_attributes = ['_subdir', '_info', '_date']

    def __init__(self, message=None):
        """Initialize a MaildirMessage instance."""
        self._subdir = 'new'
        self._info = ''
        self._date = time.time()
        Message.__init__(self, message)

However, based on the stacktrace, when I look at action.py in the move_selected function:

def move_selected(dedup):
    # truncated [...]
            logger.info(f"{mail!r} copied.")
    # truncated [...]

When pausing at box.add(mail), not only does the box object have the mailbox.Maildir class, but the mail object has the MaildirDedupMail class, which appears to be correct, although it is indeed missing the mail._subdir attribute. I would need more time to look into how mail is instantiated, but I hope the information thus far is somewhat helpful. I may be slow to respond in the next few days, but I appreciate anyone who is able to look into this issue.

Additional Information:

Code running with cwd as "$basedir"/code/forks/mail-deduplicate. Virtual environment created with poetry install in .venv subdir.

poetry --version
# Poetry version 1.1.4
python --version
# Python 3.7.10
pyenv version
# 3.7.10 (set by "$basedir"/code/forks/mail-deduplicate/.python-version)
"$basedir"/code/forks/mail-deduplicate/.venv/bin/mdedup --version
# mdedup 6.1.3
# {'username': '-', 'guid': '82f4afc3ac75c9fa8c7849ab3364986', 'hostname': '-', 'hostfqdn': '-', 'uname': {'system': 'Linux', 'node': '-', 'release': '5.10.16-arch1-1', 'version': '#1 SMP PREEMPT Sat, 13 Feb 2021 20:50:18 +0000', 'machine': 'x86_64', 'processor': ''}, 'linux_dist_name': 'arch', 'linux_dist_version': 'Arch', 'cpu_count': 8, 'fs_encoding': 'utf-8', 'ulimit_soft': 8192, 'ulimit_hard': 524288, 'cwd': '-', 'umask': '0o2', 'python': {'argv': '-', 'bin': '-', 'version': '3.7.10 (default, Feb 18 2021, 17:50:07) [GCC 10.2.0]', 'compiler': 'GCC 10.2.0', 'build_date': 'Feb 18 2021 17:50:07', 'version_info': [3, 7, 10, 'final', 0], 'features': {'openssl': 'OpenSSL 1.1.1j  16 Feb 2021', 'expat': 'expat_2.2.8', 'sqlite': '3.34.1', 'tkinter': '', 'zlib': '1.2.11', 'unicode_wide': True, 'readline': True, '64bit': True, 'ipv6': True, 'threading': True, 'urandom': True}}, 'time_utc': '2021-02-19 10:10:34.969315', 'time_utc_offset': -5.0, '_eco_version': '1.0.1'}

For convenience, corresponding JSON:

    "username": "-",
    "guid": "82f4afc3ac75c9fa8c7849ab3364986",
    "hostname": "-",
    "hostfqdn": "-",
    "uname": {
        "system": "Linux",
        "node": "-",
        "release": "5.10.16-arch1-1",
        "version": "#1 SMP PREEMPT Sat, 13 Feb 2021 20:50:18 +0000",
        "machine": "x86_64",
        "processor": ""
    "linux_dist_name": "arch",
    "linux_dist_version": "Arch",
    "cpu_count": 8,
    "fs_encoding": "utf-8",
    "ulimit_soft": 8192,
    "ulimit_hard": 524288,
    "cwd": "-",
    "umask": "0o2",
    "python": {
        "argv": "-",
        "bin": "-",
        "version": "3.7.10 (default, Feb 18 2021, 17:50:07) [GCC 10.2.0]",
        "compiler": "GCC 10.2.0",
        "build_date": "Feb 18 2021 17:50:07",
        "version_info": [3, 7, 10, "final", 0],
        "features": {
            "openssl": "OpenSSL 1.1.1j  16 Feb 2021",
            "expat": "expat_2.2.8",
            "sqlite": "3.34.1",
            "tkinter": "",
            "zlib": "1.2.11",
            "unicode_wide": true,
            "readline": true,
            "64bit": true,
            "ipv6": true,
            "threading": true,
            "urandom": true
    "time_utc": "2021-02-19 10:10:34.969315",
    "time_utc_offset": -5.0,
    "_eco_version": "1.0.1"

Thank you!

kaz-yos commented 3 years ago

@alisraza, thanks for the detailed investigation!

pechfunk commented 3 years ago

It looks like the problem is in the DedupMail constructor which tries to auto-detect which of the superclasses is the one that contributes Message-ness.

    def __init__(self, message=None):
        """Initialize a pre-parsed ``Message`` instance the same way the default
        factory in Python's ``mailbox`` module does.
        # Hunt down in our parent classes (but ourselve) the first one inheriting the
        # mailbox.Message class. That way we can get to the original factory.
        orig_message_klass = None
        for klass in inspect.getmro(self.__class__)[1:]:
            if issubclass(klass, mailbox.Message):
                orig_message_klass = klass
        assert orig_message_klass

        # Call original object initialization from the right message class we
        # inherits from mailbox.Message.
        super(orig_message_klass, self).__init__(message)

Now when the search finds a Message-like class orig_message_klass, the super-call will ensure that the successor of orig_message_klass in the MRO will be called first. This means for Maildir messages that the plain Message ctor gets called, but MaildirMessage's not.

I've tried to repair the clever construction in PR #222 . I'm not sure that the cleverness is necessary here, with only a handful of message classes to support, and little innovation in the field of Mbox dialects going on in general. But at least mdedup runs for me again!

kdeldycke commented 3 years ago

little innovation in the field of Mbox dialects going on in general

Indeed! I apologize for that part being well over-engineered. I wanted that part to be future-proof, why the vague idea of extending it to other source of mails (Gmail? S3?). But it ended up increasing complexity with little benefits.

Anyway, thanks a lot @pechfunk for diving deep into the root cause and proposing a fix! I just merged it back upstream, and try to cur a new release.

github-actions[bot] commented 3 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.