Closed lhoward closed 8 years ago
Is there a reason it has to be a mangled name? The qualified unmangled name produced by String(T.self) should be unique for any T. I envision us having a corresponding runtime API to parse and look up a type by (unmangled) name as well.
We need to match NSClassFromString()/NSStringFromClass() behaviour or, more accurately, their analogues in libobjc. For Swift class names deeper than two levels (e.g. a nested class), this returns the mangled type name. See:
https://github.com/lhoward/swift-corelibs-foundation/commit/394998259009c133e94466ef7ff33875fa673011
(I did originally propose on swift-corelibs-dev what you suggested.)
I see, is it expected that archives will be interoperable across the objc and corelibs implementations?
I don't speak for Apple, but my understanding is yes.
cc @phausler
Alternatively we limit interoperability to Foundation and single-level non-generic Swift classes. I realise that the current implementation is broken for generic classes.
Really it would be nice to have:
a variant of swift_getTypeName() that either conforms to the class_getName() behaviour on Darwin, or unconditionally returns the mangled type name
a swift_get[Class]MetadataByName() or similar (using dlsym() or whatever heuristics suit)
Tony and I discussed early on - and cross platform archival is really important in our opinion since it is one of the defining features of Foundation.
We might be able to get away with not letting nested classes be codable due to class encoding differentials (since id guess that those are the distinct minority) but I would like to have a story for them (even if we have to modify the darwin version to be in parity) since I know that the mangling patterns are subject to change and we should try not to rely on them.
Yeah, the mangling might be too brittle for cross-platform and cross-version mangling, especially since we've been discussing potentially using platform-dependent manglings. It might be worth changing the ObjC Foundation behavior to reliably use demangled type names for all types. There's already a swift_getTypeName API libobjc could potentially plug in to.
actually cross platform works so far for the places we do use it surprisingly enough (it is just our module name that is the rub) the version changes however are VERY brittle; we have had one break so far since the inception of this project already.
I will follow up on the Darwin version to see what we can do for that.
FWIW @belkadan said this in a post to swift-corelibs-dev:
No, we cannot encode things "non-mangled but with the namespace". For any type other than top-level non-generic class types, using a non-mangled name is not unique. The only correct answer for arbitrary classes is to use mangled names, or something that maps one-to-one with mangled names.
This appears to contradict @jckarter's statement above that "The qualified unmangled name produced by String(T.self) should be unique for any T".
Sorry, I meant String(reflecting: T.self)
, which produces the fully qualified name. That is and should remain unique per-type. There are tradeoffs: The mangled name is more compact, but less human-understandable. It's also more implementation-dependent (though that's not a problem if we successfully finalize the mangling). Being able to generate demangled names from arbitrary types is also a code size increase for the runtime; I think we need to be able to parse and print human-readable demangled type names no matter what, but we don't necessarily need to be able to generate mangled type names.
I think the goal here is to have it conform to an identity transform since NSClassFromString and NSStringFromClass can round trip (and consequently then of course things that are encoded with NSKeyedArchiver can be decoded via NSKeyedUnarchiver).
I have a hacky (but less hacky than reaching into the stdlib for it's private functions) way of dealing with the naming of Foundation (SwiftFoundation.NSString versus NSString etc) classes that was proposed in the discussion with Jordan. It will end up making the surface functionality work similarly enough for superficial needs but in the end it will be a behavioral difference that will eventually need to be contended with.
It should be possible for the Swift runtime to support a round-tripping getTypeByName such that swift_getTypeByName(swift_getTypeName(x))
gives you back the same type, yeah.
@phausler commit 09f143983 (I think I pushed it) removes support for looking up mangled names in NSCoding, that will still allow Foundation to work without explicitly registering the classname mappings
Noted regarding the overhead of putting the mangling code into the runtime – however if we're going to leverage the dynamic linker for looking up classes then swift_getTypeByName() will need this. The alternative of generating a name to type dictionary at compile time is also bloat (of undefined size), although it could have the advantage of being able to lookup classes for which there are no public symbols (but I don't consider this a feature).
We shouldn't use dlsym. That's really slow, requires the symbols to be public or at least unstripped, and doesn't work at all for JITed code. We did that as a hack to get bare-bones protocol conformance lookup working in the first Swift betas and it was awful. A better approach would be a registration table for reflectable types similar to the protocol conformance table, or perhaps just using the protocol conformance table as is, since most types will conform to at least one protocol, and we can add dummy entries for ones that don't.
This would work for NSCoding as NSCoding/NSSecureCoding are protocols. Can you give me any hints on where to start?
Sure, for a rough cut take a look at swift_conformsToProtocol
's implementation. First, it collects the conformance tables from all the currently-loaded images and installs callbacks to pick them up from any future dynamically dlopen
-ed ones. To do the lookup itself, it first checks its cache, then falls back to scanning any conformance tables that haven't yet been scanned. A similar approach should get you started with a type name lookup implementation. There are some obvious optimizations that could be added later (for instance, if each image knew its module name, we could key the conformance tables by section to reduce the amount we need to scan).
Thanks @jckarter! I will take a look.
OK, I have something simple working that implements:
@warn_unused_result
public // @testable
func _typeByName(name: String) -> Any.Type?
No caching yet but that should be trivial to add.
Of course it only works if the class conforms to a protocol, so it's not a general purpose solution, but it would be good enough for NSKeyedUnarchiver, or perhaps all of Foundation (given the presence of NSObjectProtocol).
Edit: saw your comment above about adding dummy conformance entries for types that do not conform to protocols.
An initial non-caching, incomplete implementation sounds like a good starting point to me. We can build from there. Thanks for taking this on!
A pleasure and thanks for being so responsive (particularly over the holiday break!). It'll be great for object serialization to have the dynamism of the Foundation on Darwin.
This patch is working OK with NSKeyedUnarchiver, the only issue is that subclasses will not have a protocol conformance entry. So something like:
aClass = _typeByName("Foundation.NSMutableArray", conformingTo: NSCoding.self)
will fail. So it's possible the only approach that will work is to have dummy conformance table entries for every class, or at least duplicate ones for subclasses, at which point maybe we are better off just emitting a table of pointers to the class metadata in each object. That might be beyond me though...
Here's where I am so far:
https://github.com/apple/swift/compare/master...lhoward:SR-381
Cool, feel free to start a pull request. I'll be on real vacation starting tomorrow though so might not be able to comment till next week. IMO, the runtime call should just be getTypeByName
without also checking the protocol; you'll have to do an as?
cast from Any.Type
to the existential metatype anyway, which can do the protocol check.
It is not un-reasonable for us to mark NSMutableArray to adopt NSCoding if that would work. Not sure if that would help any or not.
Unfortunately explicitly conforming subclasses to a protocol to which their superclass already conforms triggers the following error:
$ ./swiftc -o lookup lookup.swift
lookup.swift:8:29: error: redundant conformance of 'SubClass' to protocol 'SomeProtocol'
class SubClass : SomeClass, SomeProtocol {
^
lookup.swift:8:7: note: 'SubClass' inherits conformance to protocol 'SomeProtocol' from superclass here
class SubClass : SomeClass, SomeProtocol {
^
If we can silence the error it would work for Foundation – but it'll be confusing for end users because the failure (not being able to unarchive) will happen at runtime. But it would be a start I guess.
I agree we ultimately want a generic getTypeByName
API that does not take a protocol. However having an API whose name suggests it will work for all cases, but which only works for a subset, might be confusing.
Edited: I think I see what you mean.
I think that's fine, since only working on types with conformances is hopefully a temporary situation. That error is intended language behavior, and I don't think we can reasonably disable it. Can Foundation encode NSMutableStrings as NSStrings? Are there other important subclasses?
We might be able to get away with disabling subclass encoding for now, I'll look at what the other classes are. If they're just mutable subclasses, it's perhaps more an uncommon scenario than their immutable superiors, and fortunately NSNumbers (which are a subclass of NSValue \<NSCoding>) are encoded directly.
(We can't change the encoding format, e.g. to add a mutable flag, because the format needs to be identical with Foundation on Darwin.)
With this patch and the corresponding change to Foundation (see lhoward/nscoding-SR-381 branch), the Foundation archiving tests that do not use mutable subclasses pass.
A workaround which isn't too inelegant (at least for immediate subclasses lacking explicit protocol conformance that we need to archive) is to define a dummy protocol.
// _typeByName() only works for classes with explicit protocol conformances, so subclasses
// such as NSMutableDictionary and MSMutableArray that inherit their conformances cannot be
// looked up by name. We declare a dummy protocol here until this issue is resolved.
internal protocol __NSDummyProtocol {}
Not sure why I didn't think of this before!
Tested on OS X as well.
I'm not fully caught up here but I really don't want to commit to our demangling format as ABI.
private class Foo {}
debugPrint(Foo.self)
I can't currently think of a case where the demangling won't be unique, but (a) we certainly haven't guaranteed that, (b) trying to do something meaningful for generics gets nasty, and (c) even if it's unique I don't want to promise it's stable.
We have a stable way of identifying classes: the mangled name.
Interesting.
The patch above implements lookup by demangled name only, that was the approach I was encouraged to take by @jckarter, in part because the mangling format is subject to change. (This does actually break compatibility with Foundation archives containing nested Swift classes, where the _Tt form of the mangled name is used, but @phausler intimated that this behaviour might actually have been unintended.)
We can get at the mangled type name in the nominal type descriptor (although judging by the comment header we do need to do some extra work for generics), so it should be trivial to implement this without bringing the mangling code into stdlib. If we change the API behaviour to take a mangled name then we should probably change its name of course, as the current name does imply it's the inverse of _getTypeName()
.
I await your input (noting that [SR-377] is blocked on this). Regarding Foundation, for now a solution that only works with top-level classes will be fine, and because we need to support archive interoperability, it then just becomes a question of what sort of name Foundation should pass the runtime.
Having the mangling stop changing is part of ABI stability. I think it's far more likely that the demangling is subject to change than the mangling.
What's the best approach for Foundation to use in the meantime? An SPI that takes a demangled name, or to put the mangling into Foundation and have an API (edit: in stdlib) that takes the mangled name? Just depends on where the instability should be baked.
I'm really confused. Why would Foundation be doing mangling?
I feel like you should assume that demangling is at least as unstable as the mangling itself. I think we can promise to continue supporting top-level classes in the "Foo.Bar" syntax (@gparker42?), but other than that you should just go straight for the mangled name. I guess that part does need support from Foundation on the decoding side; for the encoding side I'm not sure if you want to pattern-match the mangled string or ask for a separate entry point.
As you point out, Foundation needs to support top-level classes that are encoded without the mangled name for compatibility with existing archives. If stdlib only provides an API for looking up a class given a mangled name, then by deduction Foundation must make a mangled name when decoding these classes. (Also, for the encoding side we would also need an API to get the mangled name given an instance, something which I don't believe exists today.)
Would you suggest something like this?
func _mangledTypeName(type: Any.Type) -> String
func _typeByMangledName(name: String) -> Any.Type?
and
func _mangleTypeName(name: String) -> String
func _demangleTypeName(name: String) -> String?
And if so, where should the latter live – I'm presuming Foundation to avoid bloating stdlib with the mangling code?
Also, note that my current implementation of NSClassFromString/NSStringFromClass only supports top-level classes anyway and if @phausler agrees, I'd prefer to get this integrated rather than maintaining a long-running branch. However, it does need some support from stdlib and if a demangled _typeByName()
would never be accepted we need to find an alternate approach.
No, no. Either Foundation should just have a list of NSCoding-compatible classes…
[
// ...
"NSDate": NSDate.self,
"NSURL": NSURL.self,
// ...
]
…or we should add something like @objc
on Darwin platforms that controls a class's "runtime name".
Either the list of classes we support here is finite or it isn't. If it isn't, there's no reason to limit this logic to Foundation.
I'm interested in what @jckarter has to say here, too, since he seem(s|ed) to think demangled names would work.
The finite list then would basically mean that NSKeyedUnarchiver is completely not useful for anything other than Foundation classes. That is not the common usage by any means.
No, those are just the special cases. Everything else uses either the "Foo.Bar" syntax or a proper mangled name. This is exactly what setClass(_:forClassName: )
does; it's just a prepopulated list for Foundation.
Ah, OK. Well, a static mapping I wanted to avoid as – where's the fun in that?
Adding something like @objc is fine but requires source code changes in third-party consumers of NSCoding. In which case they could just call NSKeyedArchiver.setClassName()
and NSKeyedUnarchiver.setClass()
and be done with it, no point changing the compiler.
Regardless, we still need a way to lookup a type given a name, and if that name is not the same name as returned by _typeByName(), we also the inverse function.
I think we have different definitions of what is mangled and that is leading to some confusion;
Foo.Bar is a fully qualified module name from my perspective and _TMC15SwiftFoundation19_NSCFConstantString is a mangled name
import Foundation
class Bar : NSObject {
class Baz : NSObject { }
}
print("\(NSStringFromClass(Bar.self))")
print("\(NSStringFromClass(Bar.Baz.self))")
prints Foo.Bar provided the module name is Foo which I would consider to be a fully qualified name.
prints _TtCC3Foo3Bar3Baz which I would consider to be a mangled name (and perhaps a bug on darwin; however we will need a story to deal with this for binary compatibility from the objc version)
Are there better names for what I am calling these?
@belkadan so are you proposing that when archiving or unarchiving top-level classes, an explicit class-to-name mapping must be set on the coder?
@phausler I think we're on the same page. I think @belkadan is proposing that top-level names (such as "Foo.bar") only be encoded when an explicit mapping is set. That would have the advantage of not requiring Foundation nor stdlib to do any mangling, however it would require source code changes for archive interoperability.
I believe he is also suggesting that otherwise the mangled name always be used, and that any API to lookup a class given a name only take the mangled type name.
But of course I could be wrong!
I should add that my own opinion is that it would be nice not to require third-party source changes, and that even if stdlib only provides a mangled name -> Type transform, Foundation should possibly still try to do the right thing.
Mm, that's not quite it. My proposal is that all Swift types are encoded using mangled names, except non-generic non-private top-level classes, which should use "Foo.Bar" syntax. ("Fully-qualified" is a perfectly good way to describe these names, but it doesn't cover all demangled names.) This is compatible with what we currently have on Darwin, and you've convinced me that it's useful to not require changes to existing code here.
(I thought about using the mangled name even for top-level classes on Linux, because NSClassFromString on Darwin can find classes that way too, but (a) this is one case where the demangled name probably is stable, and (b) we'd still have to handle "Foo.Bar" syntax on Linux.)
Additionally, on Darwin we have @objc(Bar)
, which allows you to set a custom runtime name for a class. We should continue to honor that on Darwin, and may want something similar on Linux. (Or we may not, since it's explicitly opting out of module-level namespacing, which can cause problems.)
The reason I called in @gparker42 (and @jckarter) is because this means committing to NSStringFromClass's current behavior. Once we want cross-platform archives, we need the Linux implementation to match the Darwin implementation, and the Darwin implementation is (currently) built on top of NSStringFromClass. (More discussion on this might need to happen inside the Apple Wall Of Secrecy; not sure.)
Foundation itself is special because we have classes that are implemented using Objective-C on one platform and Swift on another. No one else has that problem; if an Objective-C class is replaced by a Swift one they can and should use an explicit mapping, just like if an Objective-C class changes its name. (Alternately, they can use @objc
.)
Consequently, rather than adding magic to assume that an unmangled unqualified name is in Foundation, I would just provide explicit mappings for the finite number of classes we care about. (And if we decide to expose type-from-name logic somewhere else in Swift, I would probably leave out this compatibility layer. It's pretty archiving-specific.)
Additional Detail from JIRA
| | | |------------------|-----------------| |Votes | 1 | |Component/s | Standard Library | |Labels | New Feature, AffectsABI, Runtime | |Assignee | @jckarter | |Priority | Medium | md5: 13f1c2ae7174aab8983f1dde3a976c53blocks:
relates to:
Issue Description:
This may be useful to implement NSStringFromClass()-compatible behaviour which is necessary in order to implement NSKeyedArchiver. I am using the attached workaround at the moment but obviously it is very fragile.