Closed dschrempf closed 3 years ago
So it turns out this was caused by me executing mdep
from outside the virtual environment. Pretty stupid that this can actually be done :).
Sorry I have to reopen. This was not my fault. The error does not happen when using -n
.
Same as https://github.com/kdeldycke/mail-deduplicate/issues/135?
I got the same error with version 6.1.2.
mdedup 6.1.2
{'username': '-', 'guid': '7d002aa8ff457a7721f6a7ad164505f', 'hostname': '-', 'hostfqdn': '-', 'uname': {'system': 'Darwin', 'node': '-', 'release': '20.3.0', 'version': 'Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64', 'machine': 'x86_64', 'processor': 'i386'}, 'linux_dist_name': '', 'linux_dist_version': '', 'cpu_count': 12, 'fs_encoding': 'utf-8', 'ulimit_soft': 256, 'ulimit_hard': 9223372036854775807, 'cwd': '-', 'umask': '0o2', 'python': {'argv': '-', 'bin': '-', 'version': '3.7.1 (default, Oct 23 2018, 14:07:42) [Clang 4.0.1 (tags/RELEASE_401/final)]', 'compiler': 'Clang 4.0.1 (tags/RELEASE_401/final)', 'build_date': 'Oct 23 2018 14:07:42', 'version_info': [3, 7, 1, 'final', 0], 'features': {'openssl': 'OpenSSL 1.1.1d 10 Sep 2019', 'expat': 'expat_2.2.6', 'sqlite': '3.25.3', 'tkinter': '8.6', 'zlib': '1.2.11', 'unicode_wide': True, 'readline': True, '64bit': True, 'ipv6': True, 'threading': True, 'urandom': True}}, 'time_utc': '2021-02-10 23:31:27.276025', 'time_utc_offset': -5.0, '_eco_version': '1.0.1'}
I think this is likely the same as #135 as @kaz-yos pointed out.
Running the following command:
"$basedir"/code/forks/mail-deduplicate/.venv/bin/mdedup \
--input-format maildir \
--size-threshold 0 \
--content-threshold 0 \
--strategy discard-all-but-one \
--action move-selected \
--export "$output_path" \
--export-format maildir \
--verbosity debug \
"$mail_source_1" "$mail_source_2"
Yields (truncated output):
● Phase #3 - Perform action on selected mails
Perform move-selected action...
232 mails selected for action.
Creating new maildir box at [$output_path] ...
debug: Locking box...
debug: Move <MaildirDedupMail ["$mail_source_1"]:[NNNNNNNNNN].[NNNNN]_[NNN].["$hostname"],U=[NNN]> form ["$mail_source_1"] to ["$output_path"]...
With stacktrace:
File "["$basedir"]/code/forks/mail-deduplicate/mail_deduplicate/cli.py", line 388, in mdedup
perform_action(dedup)
File "["$basedir"]/code/forks/mail-deduplicate/mail_deduplicate/action.py", line 114, in perform_action
method(dedup)
File "["$basedir"]/code/forks/mail-deduplicate/mail_deduplicate/action.py", line 62, in move_selected
box.add(mail)
File "[~]/.pyenv/versions/3.7.10/lib/python3.7/mailbox.py", line 300, in add
subdir = message.get_subdir()
File "[~]/.pyenv/versions/3.7.10/lib/python3.7/mailbox.py", line 1537, in get_subdir
return self._subdir
AttributeError: 'MaildirDedupMail' object has no attribute '_subdir'
Coding is a side-hobby and I haven't looked at python code for a while, but from stepping through the code, my best guess is that when the mail
object is created as a subclass, it may be running the __init__
function from the python standard library's Message
class rather than the MaildirMessage
class, given the __init__
function for the MaildirMessage
class is:
class MaildirMessage(Message):
"""Message with Maildir-specific properties."""
_type_specific_attributes = ['_subdir', '_info', '_date']
def __init__(self, message=None):
"""Initialize a MaildirMessage instance."""
self._subdir = 'new'
self._info = ''
self._date = time.time()
Message.__init__(self, message)
However, based on the stacktrace, when I look at action.py
in the move_selected
function:
def move_selected(dedup):
# truncated [...]
box.add(mail)
dedup.sources[mail.source_path].remove(mail.mail_id)
logger.info(f"{mail!r} copied.")
# truncated [...]
When pausing at box.add(mail)
, not only does the box
object have the mailbox.Maildir
class, but the mail
object has the MaildirDedupMail
class, which appears to be correct, although it is indeed missing the mail._subdir
attribute. I would need more time to look into how mail
is instantiated, but I hope the information thus far is somewhat helpful. I may be slow to respond in the next few days, but I appreciate anyone who is able to look into this issue.
Code running with cwd as "$basedir"/code/forks/mail-deduplicate.
Virtual environment created with poetry install
in .venv
subdir.
poetry --version
# Poetry version 1.1.4
python --version
# Python 3.7.10
pyenv version
# 3.7.10 (set by "$basedir"/code/forks/mail-deduplicate/.python-version)
"$basedir"/code/forks/mail-deduplicate/.venv/bin/mdedup --version
# mdedup 6.1.3
# {'username': '-', 'guid': '82f4afc3ac75c9fa8c7849ab3364986', 'hostname': '-', 'hostfqdn': '-', 'uname': {'system': 'Linux', 'node': '-', 'release': '5.10.16-arch1-1', 'version': '#1 SMP PREEMPT Sat, 13 Feb 2021 20:50:18 +0000', 'machine': 'x86_64', 'processor': ''}, 'linux_dist_name': 'arch', 'linux_dist_version': 'Arch', 'cpu_count': 8, 'fs_encoding': 'utf-8', 'ulimit_soft': 8192, 'ulimit_hard': 524288, 'cwd': '-', 'umask': '0o2', 'python': {'argv': '-', 'bin': '-', 'version': '3.7.10 (default, Feb 18 2021, 17:50:07) [GCC 10.2.0]', 'compiler': 'GCC 10.2.0', 'build_date': 'Feb 18 2021 17:50:07', 'version_info': [3, 7, 10, 'final', 0], 'features': {'openssl': 'OpenSSL 1.1.1j 16 Feb 2021', 'expat': 'expat_2.2.8', 'sqlite': '3.34.1', 'tkinter': '', 'zlib': '1.2.11', 'unicode_wide': True, 'readline': True, '64bit': True, 'ipv6': True, 'threading': True, 'urandom': True}}, 'time_utc': '2021-02-19 10:10:34.969315', 'time_utc_offset': -5.0, '_eco_version': '1.0.1'}
For convenience, corresponding JSON:
{
"username": "-",
"guid": "82f4afc3ac75c9fa8c7849ab3364986",
"hostname": "-",
"hostfqdn": "-",
"uname": {
"system": "Linux",
"node": "-",
"release": "5.10.16-arch1-1",
"version": "#1 SMP PREEMPT Sat, 13 Feb 2021 20:50:18 +0000",
"machine": "x86_64",
"processor": ""
},
"linux_dist_name": "arch",
"linux_dist_version": "Arch",
"cpu_count": 8,
"fs_encoding": "utf-8",
"ulimit_soft": 8192,
"ulimit_hard": 524288,
"cwd": "-",
"umask": "0o2",
"python": {
"argv": "-",
"bin": "-",
"version": "3.7.10 (default, Feb 18 2021, 17:50:07) [GCC 10.2.0]",
"compiler": "GCC 10.2.0",
"build_date": "Feb 18 2021 17:50:07",
"version_info": [3, 7, 10, "final", 0],
"features": {
"openssl": "OpenSSL 1.1.1j 16 Feb 2021",
"expat": "expat_2.2.8",
"sqlite": "3.34.1",
"tkinter": "",
"zlib": "1.2.11",
"unicode_wide": true,
"readline": true,
"64bit": true,
"ipv6": true,
"threading": true,
"urandom": true
}
},
"time_utc": "2021-02-19 10:10:34.969315",
"time_utc_offset": -5.0,
"_eco_version": "1.0.1"
}
Thank you!
@alisraza, thanks for the detailed investigation!
It looks like the problem is in the DedupMail
constructor which tries to auto-detect which of the superclasses is the one that contributes Message-ness.
def __init__(self, message=None):
"""Initialize a pre-parsed ``Message`` instance the same way the default
factory in Python's ``mailbox`` module does.
"""
# Hunt down in our parent classes (but ourselve) the first one inheriting the
# mailbox.Message class. That way we can get to the original factory.
orig_message_klass = None
for klass in inspect.getmro(self.__class__)[1:]:
if issubclass(klass, mailbox.Message):
orig_message_klass = klass
break
assert orig_message_klass
# Call original object initialization from the right message class we
# inherits from mailbox.Message.
super(orig_message_klass, self).__init__(message)
Now when the search finds a Message-like class orig_message_klass
, the super-call will ensure that the successor of orig_message_klass
in the MRO will be called first. This means for Maildir messages that the plain Message ctor gets called, but MaildirMessage's not.
I've tried to repair the clever construction in PR #222 . I'm not sure that the cleverness is necessary here, with only a handful of message classes to support, and little innovation in the field of Mbox dialects going on in general. But at least mdedup runs for me again!
little innovation in the field of Mbox dialects going on in general
Indeed! I apologize for that part being well over-engineered. I wanted that part to be future-proof, why the vague idea of extending it to other source of mails (Gmail? S3?). But it ended up increasing complexity with little benefits.
Anyway, thanks a lot @pechfunk for diving deep into the root cause and proposing a fix! I just merged it back upstream, and try to cur a new release.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
After execution of the following command, I get the mentioned error message (see logs):