ormsolutions / NORMA

Core code for Natural Object-Role Modeling Architect, a Visual Studio extension for ORM modeling.
Other
42 stars 18 forks source link

Naming the same name with addition of two underscore instead of one underscore is the same name #41

Open marcnoon opened 2 years ago

marcnoon commented 2 years ago

image

Noticed that it doesn't like the name if there are two underscores in SQL Server. Orm tool thinks these are the same value type even though they are slightly different. Is it possible to make these be considered as different? Maybe by preventing the expanded reading signature option? Another issue is general slowness for large dataset. I wish this didn't use xml in backend b/c it takes a huge amount of time to write the nodes in xml text, and wondering if speeding up the model somehow could be done by using some other tactic.

The error I get is: Stg_dailydistributionlog has Stg_dailydistributionlog_Mandatorydate.

Model Error: Model 'dbo' contains multiple readings with an expanded reading signature of 'stg dailydistributionlog has stg dailydistributionlog mandatorydate'. Each reading must be unique in the model.

Each Stg_dailydistributionlog has at most one Stg_dailydistributionlog_Mandatorydate. It is possible that more than one Stg_dailydistributionlog has the same Stg_dailydistributionlog_Mandatorydate.

sam-lippert commented 9 months ago

Changing "has" on one or both of those fact types would clear up the error and help specify how those two fields are different.

mcurland commented 9 months ago

The signature validation error is not going away. Although ORM is technically case-sensitive, a large subset of physical items it maps to are either case insensitive, or the name generation options that provide the names in the physical layers normalize casing. This means that when we compare signatures, we treat different forms of the same name (ObjectType, objectType, object type, object_type, etc.) as equivalent names for comparison purposes. The verbalization options also allow different representations of combined names. Treating similar names by design avoids a ton of mapping issues downstream from the editor.

Having said that, I do not believe this is treated as a blocking error for relational mapping (so the object type and fact type will still be included in the rmap), so it isn't actually blocking anything. All errors you see in NORMA can be suppressed from the display: right click the fact type (or any shape displaying the error) and you'll see the Validation Errors/Disable Error Display submenu. You can reenable display this way, or use the 'ErrorDisplay' property on the diagram. Note that this turns off/on all instances of the error display for that error type, not just one. The error is still in the model--you just can't see it.

Seriously, though, at some point this is a conceptual design issue in the model. An underscore is not a natural-language character, so attempting to encode additional information into the meaning of an object type by adding/removing extra underscores is well outside the focus of both NORMA and ORM.

The XML is a relatively small part of the performance issue. From what I recall you're dealing with a huge file here (at least 25MB). This is a really big model. There are many other factors at play:

  1. VS now has auto-save on by default, so if you stop to read a verbalization VS might try to save the file in the background. I like autosave in a Word processor, but hate it in a coding environment, where I frequently change things experimentally with no intention of saving them (e.g. type a variable in a code editor to get the IntelliSense list). You can turn this off with the Tools/Options/Environment/Documents page.
  2. The Visual Studio code generation process runs when a document is saved and when it is deactivated. There is no way in the system to distinguish the two events. So, for example, this means that clicking off your model onto the generated SQL file will regenerate the SQL file. The easiest way to work around this is to simply edit the ORM file outside of the C# project environment, then when you want to generate open the C# project that contains the file, right click the .orm file in the Solution Explorer and choose 'Run Custom Tool'.
  3. The relational mapping approach in the tool is not incremental, so the database is fully regenerated on a significant change. For the massive models you're dealing with, this can produce change logs (basically what becomes a single item on the undo stack) with hundreds of thousands of items. I'm curious what your VS memory usage is on these files. If you want to avoid this with a mature (or overlarge) model, then create a copy with the relational extensions turned off. Edit the turned-off model, then drag-copy into the full model and reconcile as needed. The XML for the ORM portions should look very similar in a diff editor (guids might change for new objects) whereas much of the generated portions are a sea of guids and basically incomparable.
  4. Please tell me you aren't trying a single-page Relational View on this model.
  5. Break the model into smaller pieces. I'm guessing you're in the 500-1000 table range with a file that size. There has to be some natural segmentation with that much data.

Large Model Possibilities

  1. The NORMA Pro stack has an extension called 'Absorption Grouping' that has partial model support and allows multiple hierarchies to be generated. Unlike the OIAL/DCIL stack in NORMA this is heavily customizable and also incremental, so it is much snappier for larger models. The idea is that a base algorithm does the 'absorption' part of the model, leaving non-absorbed objects to be 'organized'. The current implemented organization is a hierarchical mapping and can produce multiple hierarchies (and corresponding schema files) for one model. This same technology is designed to be applied to other physical mappings (class and relational, for example), I just haven't written that mapping yet (even though hierarchical mapping is much harder than relational).
  2. I've considered extending the grouping system with some group types targeted at multi-model integration. The first is a way for a group to represent 'the full orm model'. This would let you pull all non-shape ORM elements from one model to another by dragging a single node.
  3. Group types for sourcing and consuming other models could also be added. This would track the provenance of elements used in a composite model, including the source ids. Currently, cross-model drag is 100% pattern based, so if you do something to change the element signature (change a type name, add a role to a fact type, etc.) then the element cannot be matched. Tracking source model identifiers would allow drag synchronization of a full group that also supports element deletion and signature-changing events. For huge models, this would also make the process of have a 'just ORM' version much easier, with a single-node drag into a separate file with all of the generated extension turned on.

Basically, with this approach, you could break your model into smaller pieces and then have a formal mechanism for recombining, along with incremental and highly customizable outputs at all levels. Obviously, there is a lot of additional work on the NORMA side to do this.

Again, the XML is not the fundamental problem here, it is the overall size of the model and system limitations, mostly of the underlying DSL framework (since renamed the Modeling Framework in VS). 2 decades ago when DSL was conceived it had both an 'in-memory store' and an associated persistent store. The persistent store never happened, so you're left with just the in-memory version. The web-stack I'm working on is able to process incremental changes to the db and retrieves the full models very fast, but the ORM-editing parts of this system are really just getting going as the focus is on other areas (like validation and derivation of the represented domain, not the ORM model itself). https://youtube.com/@ORMSolutions has a video discussing the 'ORM Web Tooling'. The bottom line is that without breaking up your file into manageable pieces that you'll continue to hit limits of the tool. Small, acceptable delays at a 3-5MB file will simply be unacceptable if you're 10x this size. I'd also love to say that the Pro extensions currently facilitate breaking up models this size, but I haven't progressed past design discussions in this area (you can break up relational diagrams though).

marcnoon commented 8 months ago

My use case was definitely an outlier, as I just wanted to combine some databases together and then generate a graphical representation of the ORM mapping. I simply wanted to pull in this complicated model and generate an ERD diagram for the team to use so they could quickly see the relationships between things.

My technique was to combine two to three separate databases using some kind of interlinking between servers in SQL Server. Once combined I was eventually able to construct the diagram, but there were a lot of issues with removing the data first, then removing and renaming all the schema names to be in a single schema using underscores (this was achieved by using simple copy replace in the XML after changing the names to something fairly unique in my SQL Server instance), and then removing all the relational errors b/c the table or entity wouldn't appear in the diagram unless I did all that. To remove the errors, I wanted an automated solution, but the easiest way I found was to use the ORM model mapping side window to quickly find the fact types with the identifiers that had errors and set the ID type correctly so it wouldn't be an error. After a few hours, my fingers got cramps. But the ERD mapping worked after all that.

I avoided an even larger database as it would have taken weeks instead of days, and I was just complaining a little bit as there really is no other solution I could conjure up to create nice-looking ERD diagrams of the relationships.