keif888 / SQLServerMetadata

SQL Server Metadata Toolkit
Microsoft Public License
82 stars 32 forks source link

Intermediate commit on each ISPAC... #24

Closed fpatou closed 2 years ago

fpatou commented 6 years ago

hello, Great tool and I'm glade to use it and see that some other use it to...

I use it to analyse 1200 dtsx files upon 100 ispac files. I build a pivot table enabling the analysis of all the material stored in the SSIS_META db by adding some objects in the database. (i would be happy to share if you want).

For the time being I'm wondering if this is possible to have an intermediate commit after each ISPAC analysis file as the analysis phase take around 3 hours and i get a timeout at final commit ?

Or is it possible to add (and replace would be the cherry on top of the cake) ispac to existing RUN ID ?

What do you thing about this features ? Do you think this if feasible ? and can you help me to locate the part of the code i would have to change....

Of course i already to look at it by myself and make some changes but would be great if someone can help me to understand some part of the architecture of the application.

Reagrds

keif888 commented 6 years ago

At the moment the commit for the repository is called once the analysis is complete. It uses SqlBulkCopy to save the data, utilising the WriteToServer method. There is another overload on that method, that allows specifying a DataRowState. See the Repository.cs around line 261.

In theory, the commit could be modified to use the alternate WriteToServer(DataTable, DataRowState) and include only DataRowState of Added. Then after each commit of a table, the rows would need updated to a DataRowState of Unchanged, via a call to AcceptChanges() against the table. WriteToServer does not update the records in the associated DataTable's state. Then the repository commit could be called many times, for just added records. Some work to handle DataRowState.Modified will be required, as comments can be updated. But the vast majority of the execution is adding new records.

If this was done, then the various loops that are scanning items, could include a call to the repository commit.

fpatou commented 6 years ago

Hello... I did a modifications see blow and it works for the context of ispac files

in program.cs : public static void Commit_Intermediate(Repository repository) { Console.Write("Committing ISPAC analysis information to database..."); repository.Commit_Intermediate(); Console.WriteLine("Done for ISPAC File."); }

in Repository.cs : public void Commit_Intermediate() { // write each data table to the database SqlBulkCopy bulkCopy = new SqlBulkCopy(repositoryConnection);

        bulkCopy.DestinationTableName = "RunScan";
        bulkCopy.WriteToServer(runScanTable);
        runScanTable.Clear (); //in the final & next intermadiate commit mean nothing to add

        bulkCopy.DestinationTableName = "Objects";
        bulkCopy.WriteToServer(objectTable);
        objectTable.Clear ();

        bulkCopy.DestinationTableName = "ObjectDependencies";
        bulkCopy.WriteToServer(objectDependenciesTable);
        objectDependenciesTable.Clear ();

        bulkCopy.DestinationTableName = "ObjectAttributes";
        bulkCopy.WriteToServer(objectAttributesTable);
        objectAttributesTable.Clear ();

        bulkCopy.DestinationTableName = "ObjectTypes";
        bulkCopy.WriteToServer(objectTypesTable);
        objectTypesTable.Clear ();

        bulkCopy.Close();
    }

and in SsisEnumerator.cs ONLY for ISPAC FILES :

    private void EnumerateProjects(string rootFolder, string pattern, bool recurseSubFolders, string locationName)
    {
        string[] filesToInspect;
        Console.Write("Enumerating projects...");
        filesToInspect = System.IO.Directory.GetFiles(rootFolder, pattern, (recurseSubFolders) ? System.IO.SearchOption.AllDirectories : System.IO.SearchOption.TopDirectoryOnly);
        Console.WriteLine("done.");

        foreach (string projectFileName in filesToInspect)
        {
           EnumerateIntegrationServicePack(projectFileName, locationName);
            Program.Commit_Intermediate(repository);
        }
    }
keif888 commented 6 years ago

Your code will work. It's just not quite the way I have implemented it. (I hadn't added a comment that I'd made changes, as I haven't finished regression testing everything.)

If you grab the latest commit you can see they way that I implemented it, which is slightly different, as I changed the main commit routine to handle both bulk insert and updates to change records. I've also got significantly more commit's happening, on a "per folder basis", including where you have yours above.

fpatou commented 6 years ago

hello,

Ok i can see your change :-) glade to see you already work on it :-)

As is said can be very interesting in a big SSIS referential to avoid 2 hours treatment to go in the trash...

Of course the next step after this one would be to allow a new analyse to be done with the same run id avoiding to re analyse ISPAC already loaded.

And the cherry on the cake would be to allow the storage of the ISPAC filename + timestamp update date to allow a run upon a ispac referential and update only ISPAC file that have been updated without re analyse all the set... this is the first milestone of the project for me to implement in order to have a daily updated SSIS metadata upon our 1200 dtsx with 100 Ispac files.

Hope you share the same roadmap :-) any way thank you for your work.

keif888 commented 6 years ago

If you could raise each of those ideas as a separate issues. That way I can tag them as enhancements, and at least provide ideas on how they could be implemented.

fpatou commented 6 years ago

NO probs thank you again

fpatou commented 6 years ago

hello...we try this and found that if the project is in exception the commit happen...this corrupt the run cause the analysis is uncompleted. This commit should only happen if there is no exception i guess ...

Anyway we are now looping on each isapc...meaning this intermediate commit may be not needed anymore.

Regards

keif888 commented 6 years ago

I have added Rollback capability, which removes any added records, when an exception happens whilst scanning an SSIS package. Each ssis package is now committed individually.