Closed MaciejWas closed 1 month ago
Just checking:
If a type named Handle__
in a module labelled Stability: internal
is not an implementation detail that GHC can freely change, what would GHC need to do to be able have implementation details that it can freely change?
@nh2 any part of the public interface of base
needs a CLC proposal to be changed.
@Bodigrim My question was what would count as not "part of the public interface".
What is the criterion for that? How would I have to name or place a data type in order for it to count as "internal implementation detail, free to change"?
In the future, we may want to create such places so that one can make changes to syscall implementation behaviour without a proposal.
I'm not sure Handle is the right abstraction for it.
FD
FD
can't benefit from thisFor reference, the implementation is this:
data FD = FD {
fdFD :: {-# UNPACK #-} !CInt,
#if defined(mingw32_HOST_OS)
-- | On Windows, a socket file descriptor needs to be read and written
-- using different functions (send/recv).
fdIsSocket_ :: {-# UNPACK #-} !Int
#else
-- | On Unix we need to know whether this 'FD' has @O_NONBLOCK@ set.
-- If it has, then we can use more efficient routines (namely, unsafe FFI)
-- to read/write to it. Otherwise safe FFI is used.
--
-- @O_NONBLOCK@ has no effect on regular files and block devices at the moment,
-- thus this flag should be off for them. While reading from a file cannot
-- block indefinitely (as opposed to reading from a socket or a pipe), it can block
-- the entire runtime for a "brief" moment of time: you cannot read a file from
-- a floppy drive or network share without delay.
fdIsNonBlocking :: {-# UNPACK #-} !Int
#endif
}
isSeekable :: FD -> IO Bool
isSeekable fd = do
t <- devType fd
return (t == RegularFile || t == RawDevice)
So an alternative not mentioned here is:
devType
in FD
isSeekable
look up the recordThat seems like a more generic solution and everyone relying on device type doesn't need to use stat
anymore.
I'm not sure if there are obscure edge cases where people construct the FD
type manually (without opening one).
@Bodigrim My question was what would count as not "part of the public interface".
What is the criterion for that? How would I have to name or place a data type in order for it to count as "internal implementation detail, free to change"?
Anything which is exposed from a package constitutes its public interface.
In the future, we may want to create such places so that one can make changes to syscall implementation behaviour without a proposal.
You can put stuff into ghc-internal
package and do not re-export it from or use in base
.
@nh2 please raise a separate issue, if you have further questions, let's not digress here.
Intuitively I agree with @hasufell.
Unfortunately I couldn't find any design documentation that says where the boundary between Handle
and FD
should be.
A Handle
carries a hadDevice :: dev
, which is an instance of IODevice
.
This looks like Handle
s were designed to contain, and abstract over, multiple types of backend devices, one of them being FD
.
However, FD
is indeed the only IODevice
instance, so we don't get much info on what other devices could exist, to determine where seekability should live.
Also note that Handle
docs say
A handle has at least the following properties:
... whether the object is seekable;
but those docs are wrong:
The word "seek" appears nowhere else on that page, and seekability is a property of the IODevice
, not a Handle, as @hasufell showed above.
Overall I like @hasufell's suggested implementation. Here's another slight alternative:
devType
in FD, but as Maybe IODeviceType
. This allows constructing FD
manually from an int
FD we got from the system, without knowing anything about it.
FD
above are {-# UNPACK #-}
), add another constructor to [data IODeviceType = Directory | Stream | RegularFile | RawDevice
] that's | DeviceTypeNotDetermined
isSeekable
look up the record; if it's Nothing
/DeviceTypeNotDetermined
, use the existing code path of devType :: a -> IO IODeviceType
calling fdStat
hereIODeviceType
doesn't include SymbolicLink
even though that's also a stat.st_mode
.fdStat
follow symbolic links, so any symbolic link is always resolved by it to one of the other device types.devType
of an FD
can change externally, e.g. when the FD
at hand is a symlink mylink -> mytarget
and somebody replaces mytarget
by somehting else, e.g. changing it from a regular file to a FIFO.devType
and seekability are racy. Not olny after our proposed implementation to cache them; they are already racy now (by the nature of devType :: a -> IO IODeviceType
being in IO
).Store the devType in FD, but as Maybe IODeviceType. This allows constructing FD manually from an int FD we got from the system, without knowing anything about it.
Yes, I thought about that. We could then rename the constructor and provide a bidirectional pattern synonym, similar to how e.g. bytestring tries to keep the internal API "stable". The compat synonym would then fill in Nothing
for the device type.
But it seemed a bit overkill. Given that you already have to use CPP to pattern match/construct, I think it is fairly obvious that these are low level internals.
My stance these days is that:
But YMMV. I think using Maybe here is the most backwards compatible way.
Thank you for sharing your thoughts!
This looks like Handles were designed to contain, and abstract over, multiple types of backend devices, one of them being FD.
This is also why I focused on Handle
(and therefore Handle__
).
While caching DeviceType
in FD
would indeed solve the issue, I don't think this is the right approach since there is nothing wrong with FD
. The problem is the unnecessary check performed by hSeek
which operates on a Handle
.
Seeking is implemented for IODevice and the only instance is FD, that also means people using plain FD can't benefit from this
This proposal (and the issue as well, if I understood it correctly) is not about caching the device type so that we can avoid a stat
syscall. This proposal is about not performing stat
every time before performing seek
, which is a behavior of Handle
. FD
doesn't have this problem.
I think it makes perfect sense that functions defined in instance IODevice
perform a syscall or other IO operation. However, I think it does not make sense for hSeek
to do unnecessary checks.
@nh2
In the future, we may want to create such places so that one can make changes to syscall implementation behaviour without a proposal.
base
; people ought to use ghc-internal
. (This will take some time to execute with all the deprecation periods. EDIT: CLC should correct me, but there's also a maybe considerable cost of patch writing requirement: migrating whole Stackage to use ghc-internal
- providing some patches).In relation to this proposal. Handle__
, IODevice
are all GHC internals. They should not be exposed through base
(anymore). (whole GHC.IO.Handle
can be considered ghc-internal
only. Most people should be fine with System.IO
worth of Handle
abstractions).
However,
FD
is indeed the onlyIODevice
instance, so we don't get much info on what other devices could exist, to determine where seekability should live.
Not true. On windows there are
-- -----------------------------------------------------------------------------
-- The Windows IO device implementation
-- | @since base-4.11.0.0
instance GHC.IO.Device.IODevice (Io NativeHandle) where
-- | @since base-4.11.0.0
instance GHC.IO.Device.IODevice (Io ConsoleHandle) where
which wrap Windows own HANDLE
(i.e. not POSIX file descriptor)
- step 1: create a proposal to remove an "internal" module from
base
; people ought to useghc-internal
. (This will take some time to execute with all the deprecation periods. EDIT: CLC should correct me, but there's also a maybe considerable cost of patch writing requirement: migrating whole Stackage to useghc-internal
- providing some patches).
Non-experimental packages are actively discouraged to use ghc-internal
, there are no stability guarantees whatsoever.
CLC retains transitive control over parts of ghc-internal
used by base
, even if not re-exported explicitly.
But please let's not digress.
This proposal (and the issue as well, if I understood it correctly) is not about caching the device type so that we can avoid a stat syscall. This proposal is about not performing stat every time before performing seek, which is a behavior of Handle. FD doesn't have this problem.
I'm a bit confused. The GHC issue you mentioned specifically talks about linux and stat
.
What behavior are you expecting on windows? Why do you touch other platforms?
Non-experimental packages are actively discouraged to use
ghc-internal
Yes, so the needed functionality, once identified, can be re-exported from ghc-experimental
, so base
can stay away from GHC specifics.
@MaciejWas is there a consensus about next steps? Are you looking for further feedback? (Sorry, at the moment I don't have time to dive into details above, just asking about the status)
Unfortunately the status is mostly as it was when I opened this proposal. I'm still committed to resolving it but I haven't had much time lately, unfortunately.
I still need to process the alternate approach proposed by @hasufell. It still seems to me that it is a much more complex change which only resolves a subset of the issue, since FD
is not the only instance of an IODevice
.
@MaciejWas I very much appreciate that this is a difficult issue to tackle, many thanks for working on it. Nevertheless we strive to keep the list of open proposals actionable, so unless you make progress one way or another (e. g., it's your right to request a vote on the MR as is), I'll close as abandoned by the end of August. This is not a sign of malice, we can reopen the proposal once there are resources to work on it again.
Closing as abandoned, feel free to reopen when there are resources to continue.
In this proposal I suggest a solution to the issue #24220 (hSeek and hTell do unnecessary stat) raised by @nh2. It has been observed that these two functions perform unnecessary
stat()
system call and slow down IO-based code.Considered solutions
FD
is exposed and we should avoid changing its interface as it is very user-facing.hSeek
/hTell
, propagate system error instead.hSeek
.IODevice.isSeekable
pointless because seeking would be performed regardless of the function's output.Handle__
is exposed, it is marked as an implementation detail and subject to change (signified by the double underscore and explained in GHC.IO.Handle.Types ).Implementation
I believe the third solution is the most viable. I implemented it in Store file seekability in Handle__
Impact assessment
Since this is a breaking change (definition of a datatype is changed), an impact assessment is due. I will perform it in the next couple of days and update this issue.