swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.58k stars 10.36k forks source link

[SR-381] API to lookup a Type given a top-level class name #42998

Closed lhoward closed 8 years ago

lhoward commented 8 years ago
Previous ID SR-381
Radar None
Original Reporter @lhoward
Type New Feature
Status Closed
Resolution Done
Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 1 | |Component/s | Standard Library | |Labels | New Feature, AffectsABI, Runtime | |Assignee | @jckarter | |Priority | Medium | md5: 13f1c2ae7174aab8983f1dde3a976c53

blocks:

relates to:

Issue Description:

This may be useful to implement NSStringFromClass()-compatible behaviour which is necessary in order to implement NSKeyedArchiver. I am using the attached workaround at the moment but obviously it is very fragile.

lhoward commented 8 years ago

https://github.com/apple/swift-corelibs-foundation/commit/b5dc719382b9de38af0bba51b4cbf3eee233d6d4

jckarter commented 8 years ago

Is there a reason it has to be a mangled name? The qualified unmangled name produced by String(T.self) should be unique for any T. I envision us having a corresponding runtime API to parse and look up a type by (unmangled) name as well.

lhoward commented 8 years ago

We need to match NSClassFromString()/NSStringFromClass() behaviour or, more accurately, their analogues in libobjc. For Swift class names deeper than two levels (e.g. a nested class), this returns the mangled type name. See:

https://github.com/lhoward/swift-corelibs-foundation/commit/394998259009c133e94466ef7ff33875fa673011

(I did originally propose on swift-corelibs-dev what you suggested.)

jckarter commented 8 years ago

I see, is it expected that archives will be interoperable across the objc and corelibs implementations?

lhoward commented 8 years ago

I don't speak for Apple, but my understanding is yes.

cc @phausler

Alternatively we limit interoperability to Foundation and single-level non-generic Swift classes. I realise that the current implementation is broken for generic classes.

Really it would be nice to have:

phausler commented 8 years ago

Tony and I discussed early on - and cross platform archival is really important in our opinion since it is one of the defining features of Foundation.

phausler commented 8 years ago

We might be able to get away with not letting nested classes be codable due to class encoding differentials (since id guess that those are the distinct minority) but I would like to have a story for them (even if we have to modify the darwin version to be in parity) since I know that the mangling patterns are subject to change and we should try not to rely on them.

jckarter commented 8 years ago

Yeah, the mangling might be too brittle for cross-platform and cross-version mangling, especially since we've been discussing potentially using platform-dependent manglings. It might be worth changing the ObjC Foundation behavior to reliably use demangled type names for all types. There's already a swift_getTypeName API libobjc could potentially plug in to.

phausler commented 8 years ago

actually cross platform works so far for the places we do use it surprisingly enough (it is just our module name that is the rub) the version changes however are VERY brittle; we have had one break so far since the inception of this project already.

I will follow up on the Darwin version to see what we can do for that.

lhoward commented 8 years ago

FWIW @belkadan said this in a post to swift-corelibs-dev:

No, we cannot encode things "non-mangled but with the namespace". For any type other than top-level non-generic class types, using a non-mangled name is not unique. The only correct answer for arbitrary classes is to use mangled names, or something that maps one-to-one with mangled names.

This appears to contradict @jckarter's statement above that "The qualified unmangled name produced by String(T.self) should be unique for any T".

jckarter commented 8 years ago

Sorry, I meant String(reflecting: T.self), which produces the fully qualified name. That is and should remain unique per-type. There are tradeoffs: The mangled name is more compact, but less human-understandable. It's also more implementation-dependent (though that's not a problem if we successfully finalize the mangling). Being able to generate demangled names from arbitrary types is also a code size increase for the runtime; I think we need to be able to parse and print human-readable demangled type names no matter what, but we don't necessarily need to be able to generate mangled type names.

phausler commented 8 years ago

I think the goal here is to have it conform to an identity transform since NSClassFromString and NSStringFromClass can round trip (and consequently then of course things that are encoded with NSKeyedArchiver can be decoded via NSKeyedUnarchiver).

I have a hacky (but less hacky than reaching into the stdlib for it's private functions) way of dealing with the naming of Foundation (SwiftFoundation.NSString versus NSString etc) classes that was proposed in the discussion with Jordan. It will end up making the surface functionality work similarly enough for superficial needs but in the end it will be a behavioral difference that will eventually need to be contended with.

jckarter commented 8 years ago

It should be possible for the Swift runtime to support a round-tripping getTypeByName such that swift_getTypeByName(swift_getTypeName(x)) gives you back the same type, yeah.

lhoward commented 8 years ago

@phausler commit 09f143983 (I think I pushed it) removes support for looking up mangled names in NSCoding, that will still allow Foundation to work without explicitly registering the classname mappings

lhoward commented 8 years ago

Noted regarding the overhead of putting the mangling code into the runtime – however if we're going to leverage the dynamic linker for looking up classes then swift_getTypeByName() will need this. The alternative of generating a name to type dictionary at compile time is also bloat (of undefined size), although it could have the advantage of being able to lookup classes for which there are no public symbols (but I don't consider this a feature).

jckarter commented 8 years ago

We shouldn't use dlsym. That's really slow, requires the symbols to be public or at least unstripped, and doesn't work at all for JITed code. We did that as a hack to get bare-bones protocol conformance lookup working in the first Swift betas and it was awful. A better approach would be a registration table for reflectable types similar to the protocol conformance table, or perhaps just using the protocol conformance table as is, since most types will conform to at least one protocol, and we can add dummy entries for ones that don't.

lhoward commented 8 years ago

This would work for NSCoding as NSCoding/NSSecureCoding are protocols. Can you give me any hints on where to start?

jckarter commented 8 years ago

Sure, for a rough cut take a look at swift_conformsToProtocol's implementation. First, it collects the conformance tables from all the currently-loaded images and installs callbacks to pick them up from any future dynamically dlopen-ed ones. To do the lookup itself, it first checks its cache, then falls back to scanning any conformance tables that haven't yet been scanned. A similar approach should get you started with a type name lookup implementation. There are some obvious optimizations that could be added later (for instance, if each image knew its module name, we could key the conformance tables by section to reduce the amount we need to scan).

lhoward commented 8 years ago

Thanks @jckarter! I will take a look.

lhoward commented 8 years ago

OK, I have something simple working that implements:

@warn_unused_result
public // @testable
func _typeByName(name: String) -> Any.Type?

No caching yet but that should be trivial to add.

Of course it only works if the class conforms to a protocol, so it's not a general purpose solution, but it would be good enough for NSKeyedUnarchiver, or perhaps all of Foundation (given the presence of NSObjectProtocol).

Edit: saw your comment above about adding dummy conformance entries for types that do not conform to protocols.

jckarter commented 8 years ago

An initial non-caching, incomplete implementation sounds like a good starting point to me. We can build from there. Thanks for taking this on!

lhoward commented 8 years ago

A pleasure and thanks for being so responsive (particularly over the holiday break!). It'll be great for object serialization to have the dynamism of the Foundation on Darwin.

lhoward commented 8 years ago

This patch is working OK with NSKeyedUnarchiver, the only issue is that subclasses will not have a protocol conformance entry. So something like:

aClass = _typeByName("Foundation.NSMutableArray", conformingTo: NSCoding.self)

will fail. So it's possible the only approach that will work is to have dummy conformance table entries for every class, or at least duplicate ones for subclasses, at which point maybe we are better off just emitting a table of pointers to the class metadata in each object. That might be beyond me though...

lhoward commented 8 years ago

Here's where I am so far:

https://github.com/apple/swift/compare/master...lhoward:SR-381

jckarter commented 8 years ago

Cool, feel free to start a pull request. I'll be on real vacation starting tomorrow though so might not be able to comment till next week. IMO, the runtime call should just be getTypeByName without also checking the protocol; you'll have to do an as? cast from Any.Type to the existential metatype anyway, which can do the protocol check.

phausler commented 8 years ago

It is not un-reasonable for us to mark NSMutableArray to adopt NSCoding if that would work. Not sure if that would help any or not.

lhoward commented 8 years ago

Unfortunately explicitly conforming subclasses to a protocol to which their superclass already conforms triggers the following error:

$ ./swiftc -o lookup lookup.swift
lookup.swift:8:29: error: redundant conformance of 'SubClass' to protocol 'SomeProtocol'
class SubClass : SomeClass, SomeProtocol {
                            ^
lookup.swift:8:7: note: 'SubClass' inherits conformance to protocol 'SomeProtocol' from superclass here
class SubClass : SomeClass, SomeProtocol {
      ^

If we can silence the error it would work for Foundation – but it'll be confusing for end users because the failure (not being able to unarchive) will happen at runtime. But it would be a start I guess.

lhoward commented 8 years ago

I agree we ultimately want a generic getTypeByName API that does not take a protocol. However having an API whose name suggests it will work for all cases, but which only works for a subset, might be confusing.

Edited: I think I see what you mean.

jckarter commented 8 years ago

I think that's fine, since only working on types with conformances is hopefully a temporary situation. That error is intended language behavior, and I don't think we can reasonably disable it. Can Foundation encode NSMutableStrings as NSStrings? Are there other important subclasses?

lhoward commented 8 years ago

We might be able to get away with disabling subclass encoding for now, I'll look at what the other classes are. If they're just mutable subclasses, it's perhaps more an uncommon scenario than their immutable superiors, and fortunately NSNumbers (which are a subclass of NSValue \<NSCoding>) are encoded directly.

(We can't change the encoding format, e.g. to add a mutable flag, because the format needs to be identical with Foundation on Darwin.)

lhoward commented 8 years ago

With this patch and the corresponding change to Foundation (see lhoward/nscoding-SR-381 branch), the Foundation archiving tests that do not use mutable subclasses pass.

lhoward commented 8 years ago

A workaround which isn't too inelegant (at least for immediate subclasses lacking explicit protocol conformance that we need to archive) is to define a dummy protocol.

// _typeByName() only works for classes with explicit protocol conformances, so subclasses
// such as NSMutableDictionary and MSMutableArray that inherit their conformances cannot be
// looked up by name. We declare a dummy protocol here until this issue is resolved.
internal protocol __NSDummyProtocol {}

Not sure why I didn't think of this before!

lhoward commented 8 years ago

Tested on OS X as well.

belkadan commented 8 years ago

I'm not fully caught up here but I really don't want to commit to our demangling format as ABI.

private class Foo {}
debugPrint(Foo.self)

I can't currently think of a case where the demangling won't be unique, but (a) we certainly haven't guaranteed that, (b) trying to do something meaningful for generics gets nasty, and (c) even if it's unique I don't want to promise it's stable.

We have a stable way of identifying classes: the mangled name.

lhoward commented 8 years ago

Interesting.

The patch above implements lookup by demangled name only, that was the approach I was encouraged to take by @jckarter, in part because the mangling format is subject to change. (This does actually break compatibility with Foundation archives containing nested Swift classes, where the _Tt form of the mangled name is used, but @phausler intimated that this behaviour might actually have been unintended.)

We can get at the mangled type name in the nominal type descriptor (although judging by the comment header we do need to do some extra work for generics), so it should be trivial to implement this without bringing the mangling code into stdlib. If we change the API behaviour to take a mangled name then we should probably change its name of course, as the current name does imply it's the inverse of _getTypeName().

I await your input (noting that [SR-377] is blocked on this). Regarding Foundation, for now a solution that only works with top-level classes will be fine, and because we need to support archive interoperability, it then just becomes a question of what sort of name Foundation should pass the runtime.

belkadan commented 8 years ago

Having the mangling stop changing is part of ABI stability. I think it's far more likely that the demangling is subject to change than the mangling.

lhoward commented 8 years ago

What's the best approach for Foundation to use in the meantime? An SPI that takes a demangled name, or to put the mangling into Foundation and have an API (edit: in stdlib) that takes the mangled name? Just depends on where the instability should be baked.

belkadan commented 8 years ago

I'm really confused. Why would Foundation be doing mangling?

I feel like you should assume that demangling is at least as unstable as the mangling itself. I think we can promise to continue supporting top-level classes in the "Foo.Bar" syntax (@gparker42?), but other than that you should just go straight for the mangled name. I guess that part does need support from Foundation on the decoding side; for the en‌coding side I'm not sure if you want to pattern-match the mangled string or ask for a separate entry point.

lhoward commented 8 years ago

As you point out, Foundation needs to support top-level classes that are encoded without the mangled name for compatibility with existing archives. If stdlib only provides an API for looking up a class given a mangled name, then by deduction Foundation must make a mangled name when decoding these classes. (Also, for the encoding side we would also need an API to get the mangled name given an instance, something which I don't believe exists today.)

Would you suggest something like this?

func _mangledTypeName(type: Any.Type) -> String
func _typeByMangledName(name: String) -> Any.Type? 

and

func _mangleTypeName(name: String) -> String
func _demangleTypeName(name: String) -> String?

And if so, where should the latter live – I'm presuming Foundation to avoid bloating stdlib with the mangling code?

lhoward commented 8 years ago

Also, note that my current implementation of NSClassFromString/NSStringFromClass only supports top-level classes anyway and if @phausler agrees, I'd prefer to get this integrated rather than maintaining a long-running branch. However, it does need some support from stdlib and if a demangled _typeByName() would never be accepted we need to find an alternate approach.

belkadan commented 8 years ago

No, no. Either Foundation should just have a list of NSCoding-compatible classes…

[
  // ...
  "NSDate": NSDate.self,
  "NSURL": NSURL.self,
  // ...
]

…or we should add something like @objc on Darwin platforms that controls a class's "runtime name".

Either the list of classes we support here is finite or it isn't. If it isn't, there's no reason to limit this logic to Foundation.

belkadan commented 8 years ago

I'm interested in what @jckarter has to say here, too, since he seem(s|ed) to think demangled names would work.

phausler commented 8 years ago

The finite list then would basically mean that NSKeyedUnarchiver is completely not useful for anything other than Foundation classes. That is not the common usage by any means.

belkadan commented 8 years ago

No, those are just the special cases. Everything else uses either the "Foo.Bar" syntax or a proper mangled name. This is exactly what setClass(_:forClassName: ) does; it's just a prepopulated list for Foundation.

lhoward commented 8 years ago

Ah, OK. Well, a static mapping I wanted to avoid as – where's the fun in that?

Adding something like @objc is fine but requires source code changes in third-party consumers of NSCoding. In which case they could just call NSKeyedArchiver.setClassName() and NSKeyedUnarchiver.setClass() and be done with it, no point changing the compiler.

Regardless, we still need a way to lookup a type given a name, and if that name is not the same name as returned by _typeByName(), we also the inverse function.

phausler commented 8 years ago

I think we have different definitions of what is mangled and that is leading to some confusion;

Foo.Bar is a fully qualified module name from my perspective and _TMC15SwiftFoundation19_NSCFConstantString is a mangled name

import Foundation

class Bar : NSObject {
    class Baz : NSObject { }
}

print("\(NSStringFromClass(Bar.self))")
print("\(NSStringFromClass(Bar.Baz.self))")

prints Foo.Bar provided the module name is Foo which I would consider to be a fully qualified name.
prints _TtCC3Foo3Bar3Baz which I would consider to be a mangled name (and perhaps a bug on darwin; however we will need a story to deal with this for binary compatibility from the objc version)

Are there better names for what I am calling these?

lhoward commented 8 years ago

@belkadan so are you proposing that when archiving or unarchiving top-level classes, an explicit class-to-name mapping must be set on the coder?

lhoward commented 8 years ago

@phausler I think we're on the same page. I think @belkadan is proposing that top-level names (such as "Foo.bar") only be encoded when an explicit mapping is set. That would have the advantage of not requiring Foundation nor stdlib to do any mangling, however it would require source code changes for archive interoperability.

I believe he is also suggesting that otherwise the mangled name always be used, and that any API to lookup a class given a name only take the mangled type name.

But of course I could be wrong!

lhoward commented 8 years ago

I should add that my own opinion is that it would be nice not to require third-party source changes, and that even if stdlib only provides a mangled name -> Type transform, Foundation should possibly still try to do the right thing.

belkadan commented 8 years ago

Mm, that's not quite it. My proposal is that all Swift types are encoded using mangled names, except non-generic non-private top-level classes, which should use "Foo.Bar" syntax. ("Fully-qualified" is a perfectly good way to describe these names, but it doesn't cover all demangled names.) This is compatible with what we currently have on Darwin, and you've convinced me that it's useful to not require changes to existing code here.

(I thought about using the mangled name even for top-level classes on Linux, because NSClassFromString on Darwin can find classes that way too, but (a) this is one case where the demangled name probably is stable, and (b) we'd still have to handle "Foo.Bar" syntax on Linux.)

Additionally, on Darwin we have @objc(Bar), which allows you to set a custom runtime name for a class. We should continue to honor that on Darwin, and may want something similar on Linux. (Or we may not, since it's explicitly opting out of module-level namespacing, which can cause problems.)

The reason I called in @gparker42 (and @jckarter) is because this means committing to NSStringFromClass's current behavior. Once we want cross-platform archives, we need the Linux implementation to match the Darwin implementation, and the Darwin implementation is (currently) built on top of NSStringFromClass. (More discussion on this might need to happen inside the Apple Wall Of Secrecy; not sure.)

Foundation itself is special because we have classes that are implemented using Objective-C on one platform and Swift on another. No one else has that problem; if an Objective-C class is replaced by a Swift one they can and should use an explicit mapping, just like if an Objective-C class changes its name. (Alternately, they can use @objc.)

Consequently, rather than adding magic to assume that an unmangled unqualified name is in Foundation, I would just provide explicit mappings for the finite number of classes we care about. (And if we decide to expose type-from-name logic somewhere else in Swift, I would probably leave out this compatibility layer. It's pretty archiving-specific.)