mamift / LinqToXsdCore

LinqToXsd ported to .NET Core (targets .NET Standard 2 for generated code and .NET Core 3.1, .NET 5+ for the code generator CLI tool).
Microsoft Public License
41 stars 15 forks source link

C# nullables #18

Closed jods4 closed 3 years ago

jods4 commented 3 years ago

It's so good that you are carrying on this project, thanks! It is so much better than xsd.exe.

Now that C# has nullable references, do you think that you could add some support?

  1. I think this would need to be behind a command-line argument, because unsupported syntax or preprocessor directives would surface in older compiler version.

  2. The easy short-term support is to add #nullable disable at the top of generated files. This instructs the compiler that code inside this file is unaware of nullable references. It avoids issues when the generated C# file is included in a project that is null-aware by default.

  3. If possible, it would be awesome to have proper nullable annotations instead. You have all the required metadata for that and already do the work for value types. Reference types that are optional or part of a choice should be nullable.

Short discussion about the correctness of those annotations

Nullable references are tricky, esp. on complex types that can be built up. A balance must be achieved between usefulness and correctness.

When an xml file conform to its xsd is read, the annotations are exact. If not, I think one can accept incorrect nullable annotations on invalid data (garbage in -> garbage out).

About invalid data: reading the docs, I believe linqtoxsd should throw when accessing missing elements, but reading the code I believe it does not? Throwing would solve any discussion regarding the correctness of nullable annotations.

LinqToXsd also allows the creation / writing of xml files. This is more complicated since during construction the xml might not be valid. A required attribute will be null until the code sets its value.

I argue that it's more useful to have the annotation:

Again, this is assuming that linqtoxsd doesn't throw when reading a value that wasn't set (which would make this discussion useless).

mamift commented 3 years ago

I've never been able to work with the nullable reference types feature of C# 8, so I have no strong opinion on it either way, but I welcome suggestions to improve the code generation so as to better keep up to date with the latest C# language versions.

I think this would need to be behind a command-line argument, because unsupported syntax or preprocessor directives would surface in older compiler version.

Yes, definitely agree here as the generated code is .NET Standard 2.0-compatible and I would like to keep this as the default option for generating code. XML data processing is itself a legacy problem that is encountered much less and less nowadays.

The easy short-term support is to add #nullable disable at the top of generated files.

Yes, this should be an easy implementation using Roslyn.

If possible, it would be awesome to have proper nullable annotations instead. You have all the required metadata for that and already do the work for value types. Reference types that are optional or part of a choice should be nullable.

Proper annotations support might be a too big an ask for a project with this much legacy code. The code generation logic uses CodeDOM, and it was only ported to .NET Core and is not actively improved with new features or bug fixes. From memory, CodeDOM only supports generating C# 3 code (pre-dates things like async and dynamic keywords).

Since nullable reference types extend the language syntax (letting ? be suffixed to reference type names, like string? or object?), it would require a Roslyn transformation on the syntax tree generated by CodeDOM as CodeDOM itself would have no awareness of what nullable ref-types are.

But using Roslyn to transform the generated syntax tree means losing out on all the internally generated metadata and property information that is used by the schema processor to create the CodeDOM graph. Putting attributes in the right place, placing the ? question mark on the right type names I imagine would be a huge undertaking to get it right. Based on my own readings and understanding of the code base as it is, I believe it would be akin to having to re-infer that metadata a second time by deconstructing schema semantics from the generated code.

Love to be proven wrong though; this is just my initial analysis on the work required for such a change.

jods4 commented 3 years ago

XML data processing is itself a legacy problem that is encountered much less and less nowadays.

Rant: It's disappointing how MS dropped XML a long time ago, e.g. they never ported the SignedXml or EncryptedXml stuff to XDocument, they don't maintain neither xsd.exe nor linqtoxsd (which is far superior). XML may seem legacy to web developers, but in other applications it's far from a legacy format. I'm working with financial data, ISO 20022 is king here and it's all XML and absolutely not legacy nor going away. /Rant

The code generation logic uses CodeDOM, and it was only ported to .NET Core and is not actively improved with new features or bug fixes.

I wasn't aware of that. It makes things much more complicated indeed. Using [Nullable] attributes directly in source code is forbidden by C#, you have to use the new ref? language syntax.

Here's a wild idea that may make this possible without too much trouble: what if we emit magic comments and we just search and replace them in the final result?

You could emit:

public string/*__NULLABLE*/  OptionalProperty { get; set; }

And do a simple replace /*__NULLABLE*/ by ? on the final string?

mamift commented 3 years ago

Rant: It's disappointing how MS dropped XML a long time ago, e.g. they never ported the SignedXml or EncryptedXml stuff to XDocument, they don't maintain neither xsd.exe nor linqtoxsd (which is far superior). XML may seem legacy to web developers, but in other applications it's far from a legacy format. I'm working with financial data, ISO 20022 is king here and it's all XML and absolutely not legacy nor going away. /Rant

I'm very sympathetic to this; and yes, while most of my work is web development related, I have come across situations where XML is unavoidable or actually the better option. For instance, JSON schema is a primitive toy compared to what you can model in XSD.

Here's a wild idea that may make this possible without too much trouble: what if we emit magic comments and we just search and replace them in the final result?

OK that might actually be worth investigating.

I'm glad too you've been able to get some use out of this tool; I began porting this over to .NET Core when I realised how unwilling Microsoft was to improve XML tooling. Also I hate how there is this fancy, fairly complex GUI in Visual Studio for browsing/reading XSD files, but you can't use the GUI to edit anything! Only browse!

jods4 commented 3 years ago

I'm still thinking about this as well.

Having played with CodeDom more since I last commented here I learnt something rather useful here: CodeDom performs almost no "validation" of the DOM you create. For the types of properties you can actually pass a plain string that will be printed verbatim in the output. I used this trick to create "readonly" fields in a previous PR.

So appending ? to types is actually very doable.

On the other hand, I neglected another aspect. If we generate "null-annotated code" then not only the public API will have to be correct, but also the implementation. If there are bugs linked to nullable dataflows (#22 would be -- correctly -- reported as such a bug) it would create warnings in the user code base.

For this I see 2 solutions:

  1. We tweak the codegen where needed to ensure everything is correct (should be minimal changes because as far as I can tell almost everything in the implementation is actually non-nullable);
  2. We generate a pragma to disable the nullable warnings inside our generated files.
mamift commented 3 years ago

This is good. This obviates the need for a second step to add nullable annotations to the output code, like using Roslyn.

jods4 commented 3 years ago

Done in #29

jods4 commented 3 years ago

@mamift I think you can close this now. We're using it in our current project. It works well and we enjoy it quite a bit. It's really nice having VS tell you immediately when you try to use something that's not guaranteed to be there.

mamift commented 3 years ago

OK sure. Thanks for your contributions on this issue.