Set-ADDOUStructure does not take advantage of SQLlite schema

PatrickOnGit commented 1 year ago

The command Set-ADDOUStructure is simply reading the tables stored in SQLite and then does all the queries and mappings using PowerShell Where-Object and hash tables. This makes the code difficult to read and debug. The SQLite DB does use a proper schema and every data query should be done using T-SQL. This also helps to avoid and duplicates and following constraints defined in the DB schema.

The SQLite DB should provide views which would further help to understand what the tool does.

Three examples (though I'm not certain if they are build according intended design):

PropertyGroupMapping

CREATE VIEW PropertyGroupMapping
AS
SELECT
    PG.rowid as PGID,
    PG.OBJ_name as GroupName, 
    PG.OBJ_Scope as Scope, 
    PGM.OBJ_propertyname as PropertyName,
    OB.OBJ_adclass as Class,
    OB.OBJ_category as Category, 
    OB.OBJ_relatedfocus as Focus, 
    OB.OBJ_TypeOU as OUType, 
    OB.OBJ_TierAssoc as Tier,
    ADA.OBJ_Guid as AttributeGuid,
    ADA.OBJ_AdType as Type,
    ADC.OBJ_name as ClassName,
    ADC.OBJ_guid as ClassGuid
FROM
    AP_PropGroups PG 
LEFT JOIN
    AP_PropertyGroupMap PGM
ON
    PG.OBJ_name = PGM.OBJ_pgrpname
LEFT JOIN
    AP_Objects OB
ON
    PG.OBJ_refid = OB.OBJ_id
LEFT JOIN
    AD_Attributes ADA
ON
    PGM.OBJ_propertyname = ADA.OBJ_Name
LEFT JOIN
    AD_Classes ADC
ON
    OB.OBJ_adclass = ADC.OBJ_name
    WHERE PG.OBJ_enabled AND OB.OBJ_enabled

PropertyGroupMappingWithIDs

CREATE VIEW PropertyGroupMappingWithIDs
AS
SELECT
    PG.rowid as PGID,
    PG.OBJ_name as GroupName, 
    PG.OBJ_Scope as Scope,
    APR.OBJ_description as ADRightsDesc,
    APR.OBJ_value as ADRights,
    PGM.OBJ_propertyname as PropertyName,
    PG.OBJ_refid as PropertyGroupOBJ_refid,
    OBDest.OBJ_adclass as DestinationNameCHECK,
    PG.OBJ_destination as PropertyGroupOBJ_destination,
    OB.OBJ_adclass as Class,
    OB.OBJ_category as Category, 
    OB.OBJ_relatedfocus as Focus, 
    OB.OBJ_TypeOU as OUType, 
    OB.OBJ_TierAssoc as Tier,
    ADA.OBJ_Guid as AttributeGuid,
    ADA.OBJ_AdType as Type,
    ADC.OBJ_name as ClassName,
    ADC.OBJ_guid as ClassGuid
FROM
    AP_PropGroups PG 
LEFT JOIN
    AP_PropertyGroupMap PGM
ON
    PG.OBJ_name = PGM.OBJ_pgrpname
LEFT JOIN
    AP_Objects OB
ON
    PG.OBJ_refid = OB.OBJ_id
LEFT JOIN
    AP_Objects OBDest
ON
    PG.OBJ_destination = OBDest.OBJ_id
LEFT JOIN
    AD_Attributes ADA
ON
    PGM.OBJ_propertyname = ADA.OBJ_Name
LEFT JOIN
    AD_Classes ADC
ON
    OB.OBJ_adclass = ADC.OBJ_name
LEFT JOIN
    AP_Rights APR
ON PG.OBJ_rights = APR.OBJ_indicator
    WHERE PG.OBJ_enabled AND OB.OBJ_enabled AND APR.OBJ_enabled

GetPropertyGroupFromADAttribute

CREATE VIEW GetPropertyGroupFromADAttribute
AS
SELECT
    PG.OBJ_name as GroupName, PG.OBJ_Scope as Scope, PG.OBJ_enabled as Enabled,
    PGM.OBJ_propertyname as PropertyName,
    OB.OBJ_adclass as Class, OB.OBJ_category as Category, OB.OBJ_relatedfocus as Focus, OB.OBJ_TypeOU as OUType, OB.OBJ_TierAssoc as Tire,
    ADA.OBJ_Name as AttributeName, ADA.OBJ_Guid as AttributeGuid, ADA.OBJ_AdType as Type
FROM
    AD_Attributes ADA
LEFT JOIN
    AP_PropertyGroupMap PGM
ON
    ADA.OBJ_Name = PGM.OBJ_propertyname 
LEFT JOIN
    AP_Objects OB
ON
    PGM.OBJ_relatedrefid = OB.OBJ_id
LEFT JOIN
    AP_PropGroups PG
ON
    PGM.OBJ_pgrpname = PG.OBJ_name

merddyin commented 1 year ago

The original code actually read the data from the DB each time it was needed. Unfortunately this resulted in some challenges when executing in a larger environment with hundreds or thousands of locations. By executing only in a serial manner, the process would potentially take days or even weeks to complete. The fix for this was to implement the Publish cmdlet, which executes each step of the process in an orchestrated fashion that leverages runspaces via the splitpipeline plugin. When I first implemented this functionality, I was still reading everything from the DB, but noticed a lot of missing items and dropped values. When I dug into the issue, I discovered I was experiencing file lock issues. In digging into the documentation, it seems like maybe SQLite can handle some multi-access, but it isn't really built or intended for how I was leveraging it. Waiting for locks to clear didn't help either as this just bogged the deployment down again. I tested pulling all the data into memory and found that the memory footprint was relatively small. As mentioned in a response on another issue, I dealt with the discoverability in another manner, by providing a function that allows discovery of the variables and what they contain for developers to use.

That all said, I'm sure there are absolutely opportunities to optimize the SQL queries that I am performing, along with the variables and their contents. I was admittedly not much for TSQL when I wrote this...basic queries and writes are easy enough, but joins and all that are a bit more than I could do at the time.

PatrickOnGit commented 1 year ago

A solution to this challenge could be reading the database into memory:

https://learn.microsoft.com/en-us/dotnet/standard/data/sqlite/in-memory-databases

Module to create also in memory databases: https://github.com/TobiasPSP/ReallySimpleDatabase

https://stackoverflow.com/questions/5831548/how-to-save-my-in-memory-database-to-hard-disk

merddyin commented 1 year ago

So, as for reading it into memory, that is what all the hash tables and variable sets are. In my tests, it doesn't seem to add much in terms of speed...indeed, hash table lookups are an accepted and common method in PowerShell. Re-creating the DB in memory seems like quite a bit of extra work, particularly if I then later need to sync any changes back into the flat file. Do you have any data supporting an appreciable change in speed?

PatrickOnGit commented 1 year ago

No I don't. Using hash tables is great if you access the hash key, but not faster if you loop through all keys. And it gets worse when using the Where-Object looking up a single value, finding it at the very beginning but have to wait to finish the loop or loop in loops. The database is indexed and you could defined views. SQLite also supports recursive queries: SQLite | with, stackoverflow | sql-how-to-create-view-from-a-recursive-query I somewhere have such query. It allows to show what WOULD be build based on what it in the database.

Though I created a little function to visualize the OU structure. It could be improved to use different colors depending the OU type. And what would also be cool if it would visually link the delegation groups to the places where they grant access to. But just to see the structure stored in the DB is already something. And it shows if there are duplicate entries.

Import-Module PSWriteHTML -Force
Import-Module 'C:\Program Files\WindowsPowerShell\Modules\ADDeploy\1.1.4\plugins\pssqlite\PSSQLite'
$DBPath = 'C:\Program Files\WindowsPowerShell\Modules\ADDeploy\1.1.4\Resources\ADDeploySettings.sqlite'

 # connect to DB
$conn = New-SQLiteConnection -DataSource $DBPath

#TODO: How to check if ACLs are assgined to an OU. Find related information 
$IsAddedCheck = @{}
New-HTML -TitleText 'AD Deploy'  -FilePath $PSScriptRoot\ADDeployOUs.html  {
   New-HTMLSection {
        $OUTableID = Get-Random -Minimum 100000 -Maximum 2000000

        New-HTMLOrgChart -Direction LeftToRight  -ToggleSiblings -AllowExport {
            # Tier level 0|1|2
            New-OrgChartNode -Name Domain -Title Domain  {
                Invoke-SqliteQuery -Connection $conn -Query 'Select * from Cust_OU_Organization where OU_enabled = 1 and OU_Parent is null' |
                    ForEach-Object {
                        $CurrentTier = $_
                        # Focus level
                        $IsAddedCheck.Add( $_.OU_id, $_ )
                        New-OrgChartNode -Name ("{0} - {1} - {2}" -f $_.OU_Parent, $_.OU_Name, $_.OU_id) -Title $_.OU_Friendlyname {
                            Invoke-SqliteQuery -Connection $conn -Query "Select * from Cust_OU_Organization where OU_enabled = 1 and OU_Parent = $($_.OU_id)" | 
                                ForEach-Object {
                                    $TierObjInfo =  Invoke-SqliteQuery -Connection $conn -Query "Select * from AP_Objects where OBJ_enabled = 1  and (OBJ_TierAssoc in (0,1)) and  OBJ_refid = '$($_.OU_Name)'"
                                    if (  @($TierObjInfo).count -gt 1 ) {
                                        Write-Warning "$( $_ | Select * | Format-Table | Out-String)"
                                        Write-Warning "$( $TierObjInfo | Select * | Format-Table | Out-String)"
                                    }
                                    if ( $TierObjInfo.OBJ_AssignACLs ) {
                                        $BGTColor = "RedRobin"
                                    } else {
                                        $BGTColor = $null
                                    }
                                    $CurrentFocus = $_
                                    # SL - Service Line level
                                    $IsAddedCheck.Add( $_.OU_id, $_ )
                                    New-OrgChartNode -Name ("{0} - {1} - {2}" -f $_.OU_Parent, $_.OU_Name, $_.OU_id) -Title $_.OU_Friendlyname -TitleBackgroundColor $BGTColor {
                                    Invoke-SqliteQuery -Connection $conn -Query "Select * from Cust_OU_Organization where OU_enabled = 1 and OU_Parent = $($_.OU_id)" | 
                                        ForEach-Object {
                                            $FocusObjInfo =  Invoke-SqliteQuery -Connection $conn -Query "Select * from AP_Objects where OBJ_enabled = 1  and (OBJ_TierAssoc in (0,2)) and  OBJ_refid = '$($_.OU_Name)'"
                                            if (  @($FocusObjInfo).count -gt 1 ) {
                                                Write-Warning "$( $_ | Select * | Format-Table | Out-String)"
                                                Write-Warning "$( $FocusObjInfo | Select * | Format-Table | Out-String)"
                                            }
                                            if ( $FocusObjInfo.OBJ_AssignACLs ) {
                                                $BGTColor = "RedRobin"
                                            } else {
                                                $BGTColor = $null
                                            }
                                            $CurrentSL = $_
                                            # SLO level
                                            $IsAddedCheck.Add( $_.OU_id, $_ )
                                            New-OrgChartNode -Name ("{0} - {1} - {2}" -f $_.OU_Parent, $_.OU_Name, $_.OU_id) -Title $_.OU_Friendlyname {
                                            Invoke-SqliteQuery -Connection $conn -Query "Select * from Cust_OU_Organization where OU_enabled = 1 and OU_Parent = $($_.OU_id)" | 
                                                ForEach-Object {
                                                    $SLObjInfo =  Invoke-SqliteQuery -Connection $conn -Query "Select * from AP_Objects where OBJ_enabled = 1  and (OBJ_TierAssoc in (0,3)) and  OBJ_refid = '$($_.OU_Name)'"# and OBJ_relatedfocus = '$($CurrentFocus.OU_Name)'"
                                                    if (  @($SLObjInfo).count -gt 1 ) {
                                                        Write-Warning "============================="
                                                        Write-Warning "$( $_ | Select * | Format-Table | Out-String)"
                                                        Write-Warning "$( $SLObjInfo | Select * | Format-Table | Out-String)"
                                                    }
                                                    if ( $SLObjInfo.OBJ_AssignACLs ) {
                                                        $BGTColor = "RedRobin"
                                                    } else {
                                                        $BGTColor = $null
                                                    }
                                                    $CurrentSLO = $_
                                                    # Object level
                                                    $IsAddedCheck.Add( $_.OU_id, $_ )
                                                    New-OrgChartNode -Name ("{0} - {1} - {2}" -f $_.OU_Parent, $_.OU_Name, $_.OU_id) -Title $_.OU_Friendlyname {
                                                        Invoke-SqliteQuery -Connection $conn -Query "Select * from Cust_OU_Organization where OU_enabled = 1 and OU_Parent = $($_.OU_id)" | 
                                                            ForEach-Object {
                                                                $SLOObjInfo =  Invoke-SqliteQuery -Connection $conn -Query "Select * from AP_Objects where OBJ_enabled = 1  and (OBJ_TierAssoc in (0,3)) and  OBJ_refid = '$($_.OU_Name)'"# and OBJ_relatedfocus = '$($CurrentFocus.OU_Name)'"
                                                                if (  @($SLOObjInfo).count -gt 1 ) {
                                                                    Write-Warning "-----------------------"
                                                                    Write-Warning "$( $_ | Select * | Format-Table | Out-String)"
                                                                    Write-Warning "$( $SLOObjInfo | Select * | Format-Table | Out-String)"
                                                                }
                                                                if ( $SLOObjInfo.OBJ_AssignACLs ) {
                                                                    $BGTColor = "RedRobin"
                                                                } else {
                                                                    $BGTColor = $null
                                                                }
                                                                $SLOObjInfo = $_
                                                                # Object level
                                                                $IsAddedCheck.Add( $_.OU_id, $_ )
                                                                New-OrgChartNode -Name ("{0} - {1} - {2}" -f $_.OU_Parent, $_.OU_Name, $_.OU_id) -Title $_.OU_Friendlyname
                                                            }
                                                        }
                                                }
                                            }
                                        }
                                    }
                                }
                        }
                    }
            }
        }
    }
} -ShowHTML

Write-Host "Added '$($IsAddedCheck.Count)' numbers of OUs."

merddyin commented 1 year ago

So, for hash table use, I almost always use keys for lookups. The only exception to this is during the initial import, when I create some of the regular expression phrases, which grabs all of the available keys or values to output a single string, or when I create the various hashtable 'views'. As mentioned though, this is only done during the initial import. Everywhere else where I use a hashtable, I am using key or value for lookup. Now, while I am using loops during the import, I should note that this is for creation of the hashtables, not as part of filtering...I filter the larger data set, then loop through the result. Now, while I could use multiple queries to avoid the where clause, most of the data sets we are talking about are fewer than a dozen items, and I'd still have to loop to create the hash, which I would still need because it's faster to do $hash["key"] than it is to run a SQL query, even with an index, particularly since I can't just leave the DB connection open, so I have three operations for every round trip.

I am aware there there are some instances of where type lookups going on in the Publish-ADDESAEStructure cmdlet against hash tables, however that is legacy code. I know that requires fixing.

The whole reason for switching to hashtables and collections in the first place however, was that attempting to use the DB directly when running multi-threaded resulted in a lot of dropped items during provisioning. When troubleshooting this, I discovered that I was experiencing DB locks. Per the SQLite documentation, each read or write locks the entire DB to the current process. Within that single process, you can have as many simultaneous connections as you like, but other attempts to open the DB will fail. When running multi-threaded, the various runspaces are running as separate processes and therefore have to take turns accessing the DB. As I understand things, this is because SQLite wasn't designed to be threadsafe, whereas hashtables and collections are.

Now, what I could theoretically do is a bit more work to dynamically create hashtables for some things in place of some of the data collections I'm using now. For example, I could probably create a hashtable with a key of the task group name, and a value that is all of the attribute maps. This in turn might make the ACL writing process and script logic more efficient. There might be a few other places I could do something similar, but I'd have to think about it.

That all said, many of these same design flaws you are calling out in your various issues are the exact reason why this code base isn't seeing continued work, beyond those fixes I've already been forced to implement in the course of my work. I will be making some effort in this regard soon, to at least bring the dev branch into parity with the current master for most items. After that though, I don't plan to continue updating this code base outside of bug fixes and the like, or to support community requests if they make sense, though I'll happily review PRs and merge them.

The correct solution, in my opinion, isn't to continue trying to force SQLite to fill a function it wasn't designed for by trying to work around the limitations of the platform. The simple fact is that SQLight isn't the right tool for the job in this case, and the challenges with scaling it are evidence of this. This is much of what is driving the redesign for vNext (not to be confused with the codebase currently in the Dev branch of this repo). At present, the plan is to implement a full DB solution that supports multi-access, and to abstract a lot of this filtering and query work to the back end. While I'm at it, I'm also updating the DB schema to be more efficient, as well as to leverage things like views, indexing, and stored procedures where appropriate. Rather than keeping all of the business logic on the module side however, the intent is to leverage a web API to offload various elements further, and to apply some of the error handling I have to do client side now. I'm also hoping to implement some smart caching to minimize round trips to the DB. The logic for many of the current functions will be substantially simplified, as I will be able to send a value and a stage to an API and get the explicit values I need back. It won't eliminate all of the hash tables of course...it doesn't make sense to query what the OU name value for the Admin or Standard Focus containers are every time I need them, but that's part of the caching approach that has to be worked out. I also want to implement a web interface for ease of configuration instead of PowerShell functions, and to add some new features, such as Role Definitions and management. Naturally this also all requires that I implement appropriate security safeguards, to prevent tampering, and a host of other things too.

This, of course, means learning how to do all that in C# on my own at present as no one has followed through on helping with the effort...a bunch have thought it sounded interesting, but then they are too busy, so I'm on my own. It turns out that learning how to program properly is a slow process unfortunately.

merddyin / ADDeploy