Closed whitequark closed 2 months ago
For posterity, here is the workaround I came up with for the incompatible format of the history files:
import re
import os
import errno
import logging
import readline
logger = logging.getLogger(__loader__.name)
history_filename = os.path.expanduser("~/.history")
def _is_using_libedit():
if hasattr(readline, "backend"):
return readline.backend == "editline"
else:
return "libedit" in readline.__doc__
# to load:
try:
readline.read_history_file(history_filename)
except OSError as exc:
if exc.errno == errno.EINVAL: # (screaming internally)
assert _is_using_libedit()
with open(history_filename, "r") as f:
history = f.readlines()
assert history[:1] != ["_HiStOrY_V2_"], \
"History file has already been converted"
assert not history or any(" " in line for line in history), \
"Pre-conversion history file is expected to contain space characters"
backup_filename = f"{history_filename}~"
if not os.path.exists(backup_filename):
with open(backup_filename, "w") as f:
f.writelines(history)
else:
logger.warning(f"history backup {backup_filename} exists, leaving it intact")
new_filename = f"{history_filename}.new"
with open(f"{history_filename}.new", "w") as f:
f.write("_HiStOrY_V2_\n")
f.writelines([
re.sub(r"[ \\]", lambda m: f"\\{ord(m[0]):03o}", line)
for line in history
])
os.rename(new_filename, history_filename)
logger.warning(f"this Python distribution uses libedit instead of GNU readline, "
f"and their history file formats are not compatible")
logger.warning(f"REPL history file has been converted from the GNU readline format "
f"to the libedit format; backup saved to {backup_filename}")
readline.read_history_file(history_filename)
# to save:
readline.set_history_length(1000)
if _is_using_libedit():
readline.replace_history_item(
max(0, readline.get_current_history_length() - readline.get_history_length()),
"_HiStOrY_V2_")
readline.write_history_file(history_filename)
In my view, the readline
module should not be presenting the same interface for GNU readline and libedit unless it takes measures such as above to convert persisted data. The workaround is very complex and despite my best efforts probably still fragile in some edge cases, which is why I hope that it won't live on by being copied into every codebase that attempts to preserve persisted history.
Hello @whitequark! Thanks for reporting this.
I'm not sure I know what you mean by “libedit transition”? CPython supports building with libedit, and the support was improved gradually over the years, but I'm not aware of a “transition”.
(1.) This is a tough one :(
If we go for transparently converting history files, we also need a way to get back, or at least ensure that libedit's _HiStOrY_V2_
format is compatible with GNU readline (and always will be).
The workaround looks like a pain to maintain in the long term -- poking too deep into internals. I couldn't find much documentation on the _HiStOrY_V2_
format. Reading EINVAL
as “old file format” looks pretty error-prone. The _HiStOrY_V2_
token looks like something that could change in the future -- this code could, for example, start converting a v3 file back to v2 in a few years.
IMO (based on my current understanding), I think this kind of transition code should be in libedit
. If it's not there as a function we can call, CPython shouldn't try to do more than report the error it gets from libedit
.
(2.)
Hmm. Perhaps history_truncate_file
is incompatible with the _HiStOrY_V2_
format? AFAIK libedit provides its own version; maybe it's not called? Or not called correctly?
(3.)
As you say, I'm afraid _is_using_libedit
is here to stay. Sorry for the old mistake.
Backporting it to 3.12 would speed up the transition to readline.backend
by as single year. In the long run, that's not much.
(Not to mention we could only backport it to future 3.12 releases; having a feature that works in 3.12.5 but not 3.12.4 is its own kind of pain.)
I'm not sure I know what you mean by “libedit transition”?
Oh, I see in the other issue:
It seems that several popular builds of Python (from what I can tell, at least Homebrew and python-build-standalone) have switched to libedit and projects using the builtin readline module functionality
Yeah, that sounds like a bad move :/ Perhaps CPython docs should spend more words saying that the two backends aren't really compatible.
I'm not sure I know what you mean by “libedit transition”?
So what happened is that I ran into this issue headfirst with PDM's cpython builds (which are from python-build-standalone project, where since 3.12 it uses libedit for reasons unknown to me), it had disrupted my workflow in the middle of something else, and I attempted to research something I had absolutely no context for. I found some issues where Homebrew users hit the same issue, with roughly the same version range, where also the Python build was moved to libedit.
From this, the knowledge that people sometimes avoid GNU packages due to their licensing, and the fact that it seemed unthinkable to me (after noticing how incompatible the implementations are) that someone would be willing to disrupt people's workflows this much without a good reason I had concluded that there is a concerted effort behind, and the effort is to move to libedit while doing away with GNU readline entirely.
As it's clear to me now, it is wrong, and this is just an issue that fell through the cracks that happened to be very visibly disruptive to end users, which got triggered by a few seemingly independent packager decisions. Sometimes it really is just a coincidence!
I'll edit my original post.
(1.) This is a tough one :(
If we go for transparently converting history files, we also need a way to get back, or at least ensure that libedit's
_HiStOrY_V2_
format is compatible with GNU readline (and always will be).
I could definitely implement a backward transformation, and will do it the moment I encounter the reverse issue (which is probably soon, unfortunately).
The workaround looks like a pain to maintain in the long term -- poking too deep into internals. I couldn't find much documentation on the
_HiStOrY_V2_
format. ReadingEINVAL
as “old file format” looks pretty error-prone. The_HiStOrY_V2_
token looks like something that could change in the future -- this code could, for example, start converting a v3 file back to v2 in a few years.
I completely agree. (But couldn't the same be said about the dual readline/editline build in first place?)
In any case, it seems to me to boil down to a question of who spends the effort to work around it, and it seems more useful to coordinate 'more upstream' in the CPython codebase.
IMO (based on my current understanding), I think this kind of transition code should be in
libedit
. If it's not there as a function we can call, CPython shouldn't try to do more than report the error it gets fromlibedit
.
I don't disagree but isn't libedit
an essentially opaque package from some BSD distribution where I would have to chase a patch into it for who-knows-how-long, and then spend potentially years waiting for people to update their local copies of it that they build CPython with?
(2.)
Hmm. Perhaps
history_truncate_file
is incompatible with the_HiStOrY_V2_
format? AFAIK libedit provides its own version; maybe it's not called? Or not called correctly?
I was initially under impression that history_truncate_file
was a function in CPython since it wasn't prefixed with rl_
or readline_
or something like that, but now that I see it's not (I guess it's too ancient for consistent namespacing) I'm really confused. Maybe it's a bug in libedit after all?..
(3.)
As you say, I'm afraid
_is_using_libedit
is here to stay. Sorry for the old mistake. Backporting it to 3.12 would speed up the transition toreadline.backend
by as single year. In the long run, that's not much. (Not to mention we could only backport it to future 3.12 releases; having a feature that works in 3.12.5 but not 3.12.4 is its own kind of pain.)
Yeah, such things happen. I acknowledge both of these things. I have to work around already around some issues that only affect 3.10.0, 3.10.1, 3.10.2
so I know the pain...
We're missing a test for set_history_length
; I'll add it.
If the test passes CPython CI and is merged, it should also warn any redistributors if the configuration isn't working. Especially for ones who patch Python, it's probably the best course of action.
I also sent a PR for the docs. I'm not sure what kind of support we can give here, but I'm afraid transparently converting undocumented file formats isn't it.
Isn't libedit an essentially opaque package from some BSD distribution where I would have to chase a patch into it for who-knows-how-long, and then spend potentially years waiting for people to update their local copies of it that they build CPython with?
That's ... a reason for you (or your redistributor) to avoid that library, or patch it, rather than fix its bugs elsewhere. (But note that I'm not certain if the bug is there; it could be in CPython or in build settings.)
I was initially under impression that history_truncate_file was a function in CPython
Yeah, it seems GNU history is a separate library. That itself might cause some of the trouble.
If the test passes CPython CI and is merged, it should also warn any redistributors if the configuration isn't working. Especially for ones who patch Python, it's probably the best course of action.
Oh, that's a good call. I think that would be enough from my perspective.
I also sent a PR for the docs.
Thanks for the two PRs!
Reported to python-build-standalone: https://github.com/indygreg/python-build-standalone/issues/281
This is about as much as we can do on the CPython side. If anyone has more ideas, let's reopen.
Thanks again for handling this! It wasn't my proposed solution but as a maintainer I recognize that mine wasn't really viable for CPython.
Bug report
Bug description:
Since the libedit transition(correction: If an end user switches from a GNU readline build of Python to an editline build of Python), several important functions related to history files are broken.readline
module are unreadable and result in a rather confusingOSError
withexc.errno == errno.EINVAL
(without a corresponding operating system routine that returns such an error), but a workaround intercepting thisOSError
can be written that converts the old format to the new one. This has been covered before, e.g. in https://github.com/python/cpython/issues/120766.readline
module are unreadable by itself whenset_history_length
is used, as demonstrated by the code below."libedit" in readline.__doc__
, which I expect will be what many people will leave in the codebase even oncereadline.backend
becomes available.Reproducer:
Workaround:
CPython versions tested on:
3.12
Operating systems tested on:
Linux
Linked PRs