vezaynk / Infobase

PHAC Infobase
https://infobase-staging.herokuapp.com
4 stars 2 forks source link

Data tool creation automation #14

Closed vezaynk closed 5 years ago

vezaynk commented 5 years ago

In light of the completion and launch of the PASS data tool, we are looking into ways of automatically generating future data tools in the same style.

This will be a major effort and will be guided by the creation of the CMSIF data tool. The PASS data tool has been build with rapid iteration of future tools in mind, but not to the point of automation.

This issue with serve a general tracker for everything related to it, and my thoughts on the matter.

vezaynk commented 5 years ago

The infobase project is effectively a loose assembly of 3 different systems, operating under the guise of MVC.

First, the React/Redux/D3 dynamic front-end View layer is responsible for all the charting. We can assume that it will remain identical for all data tools in spite of the eventual support for aggregators, used exclusively by the Health Inequalities Data Tool (Not handled by this project as of writing). This means that no automation-related work will need to be done on the front-end.

The razor-rendered views on the other hand can vary between data tools. Eventually, the goal would be to unify them under a single template which would make it at worst automatic and at worst, built into the React dynamic view. Automating them would be a stretch goal, and unifying them would be a very long-term vision that I don't see materializing anytime soon.

Second, the Model layer is largely handled by entity framework using the code-first strategy. In effect, we only need to write a small number of files which will then migrated into the database as tables using dotnet ef. Next, we need to write a few SQL scripts, one to import, one to add french translations and another to find missing translations. In theory, these could be handled by Entity Framework, but the import logic will still need to be written. All of the files and scripts are extremely repetitive, and error-prone. The entire layer is ripe for automation.

The last layer, the Controller's viability in automation has been seen. It is not difficult to write them and they are largely driven by the Razor views, which may prove unpredictable. For the moment, automating them beyond stub method creation is not a goal.

vezaynk commented 5 years ago

All automation mentioned above will need to be seamless and robust, which is to say that it should all be done from the same interface, and a network failure should not trash the entire project (Git could be useful for this). A vision for this interface will need to be developed alongside of the automation systems.

vezaynk commented 5 years ago

The automation interface development will be put on hold until I am well acquainted with how the data is being produced in the first place. The earlier in the process my system can hook in, the better.

The automation systems are not bound by this restriction and are ripe for development.

vezaynk commented 5 years ago

As it stands, and is being implemented. My vision is that we will create our models by written C# attributes.

My initial idea was that we create a series of stub classes, annotate them, and generate the rest of the code from those attributes. One of the things we get for free through this system is our ability to generate a Master table into which we could import our data before running the import script.

As I did that, I thought about the possibility of writing the Master class first, annotating it and generating all the other classes that way. It might be the better way, I'm unsure.

vezaynk commented 5 years ago

As I did that, I thought about the possibility of writing the Master class first, annotating it and generating all the other classes that way. It might be the better way, I'm unsure.

After contemplating it, I have decided that this is the preferred way and as a result am pulling away from the previous option.

The reasoning behind this is that we should be able to derive all the meta-data from the excel file alone, without the need to manually annotate the code. With that said, the option to annotate the code will remain.

Furthermore, the experience with C# attributes was pleasant enough to consider the possibility to keep more information in them such as french and english text variants.

Last but not least, normalization can be further neglected and we can duplicate store all information in the lowest selector, unless if explicitly marked to be stored in the quick stats breakdown.

vezaynk commented 5 years ago

The fundamental idea is to forgo all repetitive code in favor of reflection. The names of properties can be reflected. The properties themselves can be reflected. Everything can be done procedurally instead of a mix of scripts.

The key here seems to read the headers of an CSV and generate a Master entity with each column of the file being represented by a property. The properties should be converted from whatever form they have, into CamelCase as per the C# standard.

From there, the system should attempt to infer the meaning of properties as much as possible. The user can properly annotate what is missing or needs corrections.

The user will need to annotate:

From there, the system should be able to properly generate all other classes, views and controllers.

vezaynk commented 5 years ago

The ground-work for run-time model loading is in place. Exciting! See the model-generator branch. It has been removed from the project and is now being pulled in as a nuget dependency here: https://www.nuget.org/packages/CSharpLoader/

I will be invested further work into it as an independent project at a later time. For now, it does everything that is needed.

vezaynk commented 5 years ago

After some reflection (pun not intended), it seems like a good idea to have the model generator a separate tool, away from MVC for modularity and performance reasons. Dotnet Core 3 will also feature first-class support for local tools. This could be a fine use of them.

The project structure is kind of weird with the nested project. I'm not sure how Microsoft would like me to architect it, but it's not that big of a deal since the tool would exclusively be used during development.

vezaynk commented 5 years ago

The architecture has been re-organized to fit a more standard "one solution, multiple projects" setup. One of the projects is a submodule. It will be used as a nuget package instead once I make a stable release for it.

Everything seems good to go to begin implementing the CSV reading and code generation.

vezaynk commented 5 years ago

Generating migrations dynamically has proven to difficult, but its on its way. The system can now reliably generate and load migrations.

This issue was the key piece of documentation needed to achieve this.

vezaynk commented 5 years ago

I have added capabilities to generically take an assembly, generate and apply migrations without needing the native types. This was the most difficult thing which needed to be done. I will need to draft up some clear TODOs next week for how I will need to proceed.

vezaynk commented 5 years ago

In order to not have to deal with reflection right off the bat, it seems like it would be responsible to manually convert the existing PASS data tool to the simplified schema and go from there. It will be annoying, but it seems like it will help in the long run.

vezaynk commented 5 years ago

The controllers can be generalized into a single abstract controller, constructed using a single DBContext which is then manipulated by using the data annotations.

The non-abstract controllers would consume their appropriate DBContext via dependency injection.

vezaynk commented 5 years ago

The model generator is now loading all the data as needed. It isn't all used however, as the classes are not yet fully annotated. The details concerned how it will handle things such as numeric and boolean data are left for later, as it isn't much use implementing until the classes actually use them. The classes will not be worked on manually any further, the next step is finally resuming work on the file generation.

vezaynk commented 5 years ago

Storing the models and migrations inside of the main project is causing grief as the model generator is forced to load ASP.NET in order to simply enumerate the types. It would be wise to split it away into its own subproject.

vezaynk commented 5 years ago

Update: the project has been restructured per the above. Model generator no longer has to interact with ASP.NET.

vezaynk commented 5 years ago

While the model generator is fine as a standalone tool, it is worthwhile to consider allow the web project to operate it too. This would allow for more lightweight usage during development, while not requiring the downloading of two projects to work. This comes at a project size cost, which I'm not sure even warrants consideration.

vezaynk commented 5 years ago

The model generator is nearly feature complete. Both the data tool and index pages are working in a dataset-agnostic manner, with only the measure description pages remaining. There are also a few missing details on the data tool (Title, axis, units).

Once the above is finished, this issue will be closed and further automation matters will have their individual issues as the model generator is now the heart of the project instead of an add-on.

After that, parity with the PASS data tool will need to be restored before being merged with master.

Finally, for purposes of clarity, the project will have to rebrand away from "Infobase" as it causes confusion with the other Infobase products.

vezaynk commented 5 years ago

CMSIF2 english is good to go. A deployment to heroku is landing shortly.

vezaynk commented 5 years ago

The heroku deployment is broken due to the subfolder structure. I have no time to try to fix it at the moment. It will have to wait for the winter.