Refactor/encoding ssh filexfer

pkg / sftp

SFTP support for the go.crypto/ssh package

BSD 2-Clause "Simplified" License

1.52k stars 380 forks source link

Refactor/encoding ssh filexfer #423

Closed puellanivis closed 3 years ago

puellanivis commented 3 years ago

This makes an encodings/ssh/filexfer sub-package that defines the secsh-filexfer protocol encoding entirely independent of any implementation of functionality.

As this will be a public API, I would like merciless review of names, and design.

My biggest question: should we jump straight to ReadDir and FStat rather than Readdir and Fstat? Or should we try to overly strictly match the snakecase from the spec? I‘ve kind of taken the route of “ignore the spec for Go-friendly” for a number of fields already.

greatroar commented 3 years ago

Thanks again for working on this.

Have you considered giving the MarshalInto methods value receivers? Now, to marshal an extended attribute, we do

attr := &ExtendedAttribute{Type: typ, Data: data}
attr.MarshalInto(buf)

which allocates the ExtendedAttribute on the heap if MarshalInto is not inlinable. With a value receiver, the whole packet goes on the stack, always, if we do

ExtendedAttribute{Type: typ, Data: data}.MarshalInto(buf)

Given that Go manages the stack as a big dynamic array, this can be a big win.

greatroar commented 3 years ago

Re:

should we jump straight to ReadDir and FStat rather than Readdir and Fstat?

Fstat mirrors os.Lstat, so it's idiomatic. I think Read[Dd]ir should match whatever you do with Read[Ll]ink.

greatroar commented 3 years ago

I think this code still mixes the concerns of allocation and the wire format. There are also several packet types that have multiple ways of marshaling and unmarshaling. If all code used Buffer as a central type and type+id were moved into it, I think the design could be cleaner and more efficient to use:

type Buffer struct {
    Type      PacketType
    RequestID uint32
    // actual buffer stuff
}

func NewBuffer() *Buffer // For senders. Sets up "hidden" space for length+type+id.

func ReadPacket(io.Reader, *Buffer) error // For receivers. Parses off length+type+id.

func (b *Buffer) AppendUint32(uint32)
func (b *Buffer) ConsumeString() (string, error) // etc.

func (b *Buffer) Send(w io.Writer, payload []byte) error

Then, marshaling code can be written without using interfaces, without NewMarshalBuffer and without size computation:

type OpenPacket struct {
    Filename string
    PFlags   uint32
    Attrs    Attributes
}

// Could be a method too, but I don't really see the benefit.
func MarshalOpenPacket(b *Buffer, filename string, pflags uint32, attrs Attributes) {
    b.AppendString(filename)
    b.AppendUint32(pflags)
    MarshalAttributes(b, attrs)
}

func (p *OpenPacket) UnmarshalPacketBody(*Buffer) error // As right now.

A Client can now manage all of its memory using a single sync.Pool (global or field) of Buffers. Servers could presumably do the same.

greatroar commented 3 years ago

As a side concern, none of the UnmarshalPacketBody functions that I've seen check whether they reach the end of the packet. This could be added to all of them, but maybe a function wrapper could do this?


type Packet interface {
    MarshalPacket(reqid uint32) (header, payload []byte, err error)

    // UnmarshalPacketBody should not be called directly. Use the function with the same name instead.
    UnmarshalPacketBody(buf *Buffer) error
}

func UnmarshalPacketBody(buf *Buffer, p Packet) error {
    err := p.UnmarshalPacketBody(buf)
    if err == nil && buf.Len() != 0 {
        err = errors.New("sftp: trailing garbage after packet body")
    }
    return err
}

puellanivis commented 3 years ago

Most of the NewBuffer usage in the library does not actually escape from the stack, and so they get allocated out of the stack. Putting them into a Pool is thus actually counter-productive. Though, I see where you’re coming from. They could then potentially save an allocated buffer, and then we could reuse that internal buffer. I see some merit there, where we could have the packets also implement MarshalInto(), but then there is confusion from things that are not packets, like Attributes and ExtensionAttribute. Though, that doesn’t mean we couldn’t use MarshalPacketInto and MarshalPacketBodyFrom…

But it turns out most of the packets types are called pretty infrequently, and do not have to be optimized so heavily that they reuse buffers. The overhead of the allocations from a OpenPacket is on the whole quite minimal. So, we really only need concern ourselves with WritePacket, DataPacket and StatusPacket, which compose the vast majority of communications. It turns out that even the allocation of the headers here is actually pretty trivial compared to the use of the payloads underneath them. That’s probably why we haven’t bothered “fixing” the sendPacket code so far. It’s such a minimal pressure on memory compared to the recvPacket.

That’s why I focused attention on ReadFrom(io.Reader, []byte) allowing a backing buffer input, to permit reuse. This is the primary memory hog/chokepoint right now. Running a benchmark comparison between an InitPacket.MarshalInto vs InitPacket.MarshalBinary the lack of allocating a buffer saves 1 alloc of 48 B, and runs in 17 ns instead of 71~81 ns. But now that I think about it, realistically, its going to have to pull that buffer out of a pool, right? Well, conveniently we already have the bufPool so making that change was easy. Now it saves 1 alloc of 48 B, and runs in ~64 ns instead of ~81 ns. I mean, obviously it’s going to have some gains there, but the question is if those gains are at all useful. We can chase optimizations forever, what we want to do is have data and profiling showing where real world bottlenecks are before just blinding making code harder to read, understand, and use.

Though, now that I’ve typed all of the above, and made a whole big argument about it. Rewriting the MarshalPacket calls to just take a []byte to use as an optional backing buffer for the header actually isn’t actually all that bad of an idea, and I’m going to incorporate it.

puellanivis commented 3 years ago

I understand the concerns of checking to make sure that we’ve consumed the whole packet, but I’m not sure that’s altogether too necessary. There’s not really much to be gained by testing it, and if clients are sending spurious data that we would be ignoring anyways, I don’t see any reason to crash the server based on it.

puellanivis commented 3 years ago

I’ve taken a close look at making the MarshalPacket value receiver so they can potentially be devirtualized if the functions are inlined, and unfortunately, doing pretty much anything other than just MarshalPacket itself makes that function too complex to be inlined.

That includes even just getting a unique request id with atomic.AddUint32(&count, 1).

So, it looks like in the Real World, that devirtualization is simply never going to happen, and I think the solution here is going to have to be the use of a pool of request packets for places where these allocations can happen in a tight loop, such as for the SSH_FXP_WRITE, SSH_FXP_DATA and SSH_FXP_STATUS packets.

As for the ExtendedAttribute, ExtensionPair, and NameEntry calls using values, my principle concern is in the performance of the append(…) functions that can programmatically build list of these structs. Their utility in isolation from their lists is pretty low. So, in this case, we’re almost always already going to already have a pointer to them. And as I mentioned, just because something is a pointer receiver does not mean it cannot be allocated on the stack.

In fact, looking at the simple example, the escape analysis shows it doesn’t escape, and is allocated on the stack, even though it also reports that MarshalInto it is too complex to inline. So, really, the only time it is going to escape onto the stack, is where MarshalInto is being run as a virtualized call through an interface, but we don’t actually do that with any of them.

$ cat -n escape_analysis_test.go | tail -n12
   141  func TestEscapeAnalysisExtendedAttribute(t *testing.T) {
   142      buf := NewBuffer(make([]byte, 0, defaultMaxPacketSize))
   143  
   144      attr := &ExtendedAttribute{
   145          Type: "foo",
   146          Data: "bar",
   147      }
   148      attr.MarshalInto(buf)
   149  
   150      var stackExtAttrib ExtendedAttribute
   151      println("escape_analysis_test.go: ext_attrib: heap:", heapExtAttrib, "packet:", attr, "stack:", &stackExtAttrib)
   152  }
$ do_escape_analysis
…
./escape_analysis_test.go:142:23: make([]byte, 0, defaultMaxPacketSize) escapes to heap:
./escape_analysis_test.go:142:23:   flow: buf = &{storage for make([]byte, 0, defaultMaxPacketSize)}:
./escape_analysis_test.go:142:23:     from make([]byte, 0, defaultMaxPacketSize) (spill) at ./escape_analysis_test.go:142:23
./escape_analysis_test.go:142:23:     from buf := make([]byte, 0, defaultMaxPacketSize) (assign-pair) at ./escape_analysis_test.go:142:18
./escape_analysis_test.go:142:23:   flow: {storage for &Buffer{...}} = buf:
./escape_analysis_test.go:142:23:     from Buffer{...} (struct literal element) at ./escape_analysis_test.go:142:18
./escape_analysis_test.go:142:23:   flow: ~R0 = &{storage for &Buffer{...}}:
./escape_analysis_test.go:142:23:     from &Buffer{...} (spill) at ./escape_analysis_test.go:142:18
./escape_analysis_test.go:142:23:     from ~R0 = <N> (assign-pair) at ./escape_analysis_test.go:142:18
./escape_analysis_test.go:142:23:   flow: buf = ~R0:
./escape_analysis_test.go:142:23:     from buf := (*Buffer)(~R0) (assign) at ./escape_analysis_test.go:142:6
./escape_analysis_test.go:142:23:   flow: {heap} = *buf:
./escape_analysis_test.go:142:23:     from attr.MarshalInto(buf) (call parameter) at ./escape_analysis_test.go:148:18
./escape_analysis_test.go:141:42: t does not escape
./escape_analysis_test.go:142:23: make([]byte, 0, defaultMaxPacketSize) escapes to heap
./escape_analysis_test.go:142:18: &Buffer{...} does not escape
./escape_analysis_test.go:144:10: &ExtendedAttribute{...} does not escape
escape_analysis_test.go: ext_attrib: heap: 0xc000132000 packet: 0xc000046f50 stack: 0xc000046f10

greatroar commented 3 years ago

Rewriting the MarshalPacket calls to just take a []byte to use as an optional backing buffer for the header actually isn’t actually all that bad of an idea, and I’m going to incorporate it.

Please, stop! Let's take a step back from performance and look at the complexity of the API. There are bytes slices with unclear ownership, packet types with MarshalBinary (but not fullfilling the interface promise), packet types with MarshalPacket, packet types with both, two types that can ReadFrom.

I have a further thought out alternative design that can drastically reduce the number of methods and types. Untested code up ahead.

type Buffer struct {
    b   []byte
    off int
}

// NewBuffer constructs a buffer with the given id,
// reusing the space allocated for b if not nil.
func NewBuffer(b *Buffer, requestID uint32) *Buffer {
    // TODO version and init packets.

    const headerSize = 9 // length+type+id

    if b == nil {
        b = &Buffer{}
    } else {
        b.Reset()
    }
    b.reserve(headerSize)

    b.AppendUint32(0) // Reserve space for length.
    b.AppendByte(0)   // Reserve space for type.
    b.AppendUint32(requestID)

    return b
}

func (b *Buffer) SetType(t PacketType) { b.b[4] = byte(t) }

This replaces NewBuffer, NewMarshalBuffer. The Buffer type also replaces RawPacket and RequestPacket. It doesn't get Bytes, Len, MarshalBinary, PutLength or UnmarshalBinary.

NewBuffer doesn't even need to know about defaultMaxPacketSize if Buffers are properly reused (although reserve might want to allocate some minimum number of bytes, say 1kB).

// ReadPacket reads a full packet from r into b.
//
// The Consume methods on b start reading after the request id.
func ReadPacket(r io.Reader, b *Buffer) error {
    // TODO version and init packets.

    const headerSize = 9 // length+type+id

    b.Reset()
    b.reserve(headerSize)

    b.b = b.b[:headerSize]
    _, err := io.ReadFull(r, b.b)
    if err != nil {
        return err
    }

    length := binary.BigEndian.Uint32(b.b)
    if length == 0 {
        return errShortPacket
    }

    b.reserve(4 + int(length))
    b.b = b.b[:4+length]
    b.off = headerSize

    n, err := io.ReadFull(r, b.b[b.off:])
    b.b = b.b[:headerSize+n]
    return err
}

func (b *Buffer) Type() PacketType { return PacketType(b.b[4]) }

func (b *Buffer) RequestID() uint32 { return binary.BigEndian.Uint32(b.b[5:]) }

You can rename this Buffer.ReadFrom if you want. In any case, it no longer needs the byte slice argument so there is less to document/think about.

UnmarshalPacketBody would work exactly as before.

// Send writes the binary serialization of b to w, followed by the payload.
// It modifies b but does not retain it or the payload.
func (b *Buffer) Send(w io.Writer, payload []byte) error {
    if payload != nil {
        b.AppendUint32(uint32(len(payload)))
    }
    binary.BigEndian.PutUint32(b.b, uint32(len(b.b)+len(payload)-4))

    _, err := w.Write(b.b)
    if err != nil {
        return err
    }

    _, err = w.Write(payload)
    return err
}

Here's the only byte slice left, and it's read-only. You can rename this Buffer.WriteTo.

The predefined packet types now all get a method with signature

// Send writes p to w.
// It uses the non-nil Buffer b to perform serialization.
// The Buffer should have the request id set appropriately.
func (p T) Send(w io.Writer, b *Buffer) error

E.g.,

func (p AttrsPacket) Send(w io.Writer, b *Buffer) error {
    b.SetType(PacketTypeAttrs)
    p.Attrs.MarshalInto(b)
    return b.Send(w, nil)
}

func (p DataPacket) Send(w io.Writer, b *Buffer) error {
    b.SetType(PacketTypeData)
    return b.Send(w, p.data)
}

These replace MarshalBinary, MarshalInto, MarshalPacket. If a packet needs to be marshaled in full, just Send it to a bytes.Buffer. If MarshalBinary is still needed for whatever reason, an adapter can be written for any type that has Send.

A Client would normally get a Buffer from a Pool, NewBuffer to set the id, Send it off, reuse it to receive the response and Put it back in the Pool.

A Server would similarly get a Buffer from a Pool, receive into it, reuse it for the response using NewBuffer, Send it off and Put it on the Pool.

Summarizing:

RawPacket and RequestPacket are folded into Buffer.
Byte slices are all packaged up in Buffer. No more worrying about ownership semantics or about slices in sync.Pool when optimizing.
There is one marshaling method for each packet type. It's short and doesn't need to worry about memory management.
A single Pool gets rid of nearly all the allocations, small or tiny, with no additional effort. In particular, no length calculations are needed. That means no hidden GC overhead that you don't see in a sequential benchmark.

puellanivis commented 3 years ago

I‘ve changed most of the names to match up with ReadDir except:

Setstat or SetStat? This one is pretty borderline.
Mkdir/Rmdir or MkDir/RmDir or maybe even MakeDir/RemoveDir? Our own Client uses Mkdir/RemoveDirectory which is probably not the best and most consistent choice to copy.

MarshalPacket now takes an optional backing byte-slice that it can use for marshaling if it has a reasonably capacity (every packet uses cap() < 9 to ignore the backing buffer and allocate a new one.)

The design is that there are four types of packets:

Init/Version packets that do not have a RequestID. These should not fulfill the Packet interface, as they are extraordinary packets. They implement MarshalBinary and UnmarshalBInary instead to easily interface with binary encoding.
RawPacket/RequestPacket are special packets outside of the standard, and provided as convenience wrappers. They can ReadFrom, MarshalBinary/UnmarshalBinary because they are self-contained data structures that can render their full entirety without any input. They also implement Packet because it makes sense that they can.
The openssh/ packets provide an example of how an ExtendedPacket works. They implement MarshalBinary/UnmarshalBinary to encode their packet-specific data, and provide an implementation of Packet by wrapping a call to ExtendedPacket.
The rest of the packets only need to encode their specific packet data. Passing a reqid into them avoids having to call an ID() virtual method on them, which makes them escape to the heap. They implement Packet as they are packets, but they do not implement MarshalBinary/UnmarshalBinary because they do not have a RequestID to allow them to be self-contained. An earlier version had them carry their own RequestID but there’s really no reason why the caller at the clientConn.sendPacket endpoint really need to care about what RequestID it has. That should be coördinated at the connection level. Servers can deal with a RequestPacket that will have the RequestID and the rest of the packet.

There is no getting around the fact that there are four different types of packets here. Short of removing the RawPacket/RequestPacket and leaving us only with three types. But the RequestPacket in particular allows us to easily implement a server, while the RawPacket allows us to simplify the client response receiver.

puellanivis commented 3 years ago

Replacing the byte-slice argument with a Buffer argument is not saving anything at all. Neither complexity or arguments.

Not everything can be replaced with Send(…), particularly almost all of the MarshalInto(…) which you seem to have recognized.

No more worrying about ownership semantics or about slices in sync.Pool when optimizing.

We have already solved this problem with the bufPool, which works significantly better than even a sync.Pool of bytes.Buffer.

A single Pool gets rid of nearly all the allocations, small or tiny, with no additional effort. In particular, no length calculations are needed. That means no hidden GC overhead that you don't see in a sequential benchmark.

That’s great for the complex case, where we’re pooling all of these buffers. But what about the simple cases? Where we don’t care if they’re allocating, because we’re not concerned about allocating a few bytes here or there. Now in order to use the library, we have to concern ourselves with all our Buffer lifetime and ownerships. My solution offers a two-tier approach that can even choose between on a case-by-case basis: either you use nil and get the naive allocation, or you use a backing buffer. No special data type to go into it, just any old byte-slice. And these byte-slices eliminate the allocation reuse that you’re already complaining about. But they do offer a one-allocation naive solution on the side.

I think your concerns are based around a lot of what the codebase looks like right now, but I’ve already done a big refactor that hasn’t been included in this PR. It gets rid of a lot of waste, and overhead on the actual caller code, in exchange for complexity here. Simplicity is complex, and presenting a simpler to use API is way more important to me than simplifying this implementation itself.

If we want to just write another transparent layer and force every client to write out their own marshalling code, or track their own buffer inputs, I don’t think that’s a good idea.

greatroar commented 3 years ago

Not everything can be replaced with Send(…), particularly almost all of the MarshalInto(…) which you seem to have recognized.

MarshalInto and Send do different things, though. One sends a packet when it's ready. The other builds a piece of a packet.

But what about the simple cases? Where we don’t care if they’re allocating, because we’re not concerned about allocating a few bytes here or there. Now in order to use the library, we have to concern ourselves with all our Buffer lifetime and ownerships.

Hardly. Just get a buffer. Add local reuse or caching as necessary. Let's assume I renamed ReadPacket RecvPacket and gave it a *Buffer return value. First version:

// This is ugly, but can be wrapped up in a utility function
// for callers that are willing to pay interface overhead.
err := filexfer.OpenDirPacket{Path: p}.Send(conn, filexfer.NewBuffer(nil, id))
// Handle err.

b, err := filexfer.RecvPacket(conn, nil)
// Handle err, b, maybe using an exported variant of newPacketFromType.

functionThatDoesReaddir()

Second version:

b := filexfer.NewBuffer(nil, id)
err := filexfer.OpenDirPacket{Path: p}.Send(conn, b)
// Handle err.

b, err = filexfer.RecvPacket(conn, b)
// Handle err, b.

functionThatDoesReaddir(b)

Third version would cache Buffers somewhere.

I think your concerns are based around a lot of what the codebase looks like right now

Probably. I'm curious to see the next version.

puellanivis commented 3 years ago

I’ve already spent a whole week working on simplifying the whole code base, only to find that this sort of library was important enough to implement now, rather than later. I had a specific goal in mind, and I weighing 15 different priorities all at the same time.

Now that I’ve wasted all this time and effort understanding the specs of filexfer-02 through to filexfer-13, and the finer points of all of them, and how I could best logically represent the relevant data-types in idiomatic Go in a way that was extensible and useful enough to expand for V4, V5, and V6 support rather than hyper focusing on making this and only this version of the package work as quickly as possible.

I’m not going to spend yet another week rewriting the package for a whole different paradigm that drops 14 of the 15 goals I was trying to juggle in the first place.

This has been entirely demoralizing, and I’m out for at least a month. I’ll enjoy this vacation time spending my spare time playing video games rather than putting my 25 years of coding experience on thankless tasks that just get attacked for details that I’ve already considered and weighed, and found to be fundamentally lacking. I get enough people ignoring my expertise at work… at least there, they pay me to put up with it.

drakkan commented 3 years ago

@puellanivis first of all I want to confirm that I really appreciate your work on this project and your great reviews.

I am sorry for this and for not having had enough time to review the proposed changes (I have had some personal problems in the last few weeks). In any case I would have needed some time to try the code locally and play with it before proposing any change.

In general I think we should first make the code work and then optimize it.

Enjoy this vacation, I hope you will continue to send your great contributions to this project in the future

greatroar commented 3 years ago

@puellanivis I did not intend any of my comments as attacks on you or your work. I should at least have stopped arguing earlier. I guess I was bikeshedding.

Apologies. I fully respect all the effort you put into this package. I feel that I am to blaim, and I will stay clear of this project.

drakkan commented 3 years ago

@puellanivis I don't want to lose your work, I took the liberty of restoring your branch, we can resume from here in the future, thanks

puellanivis commented 3 years ago

One really should not write things while actively upset over an issue. I’ve had some time to cool off, and want to follow up on some things.

@greatroar It’s not necessary for you to “stay clear of this project”, but it was just very frustrating. I took many of your recommendations to heart, despite even at first arguing against them. I also incorporated some of them as best as it would fit in considering the many competing interests involved. But some requests were either asking me to conflate two responsibilities/purposes into a single item, or were criticisms about the integration of the very idea suggested. I’ve had time to think over the criticisms despite being so upset, and will likely address a few of them still. (i.e. making sure interface constraints are respected, even if that means cloning backing byte-slices, etc.)

I usually do not push a first, second, or even third draft out for review. There were a lot of different considerations that were made during these multiple drafts, and issues that arose while I was working on refactoring the main package. I can see how there might be a disconnect between seeing what is being designed here, and the integration into the package as a whole. But I’ve been trying to address far more than just “is a superficially hyper-efficient way of turning a bunch of uint32/uint64/string into a stream of bytes.” In fact, very little of the involved code is actually intended to be used by callers, and far more intended to be used by implementers of extensions/newer-versions. Now that I think about it, I could potentially split things even further to make how and where the responsibilities are different.