microsoft / AttackSurfaceAnalyzer

Attack Surface Analyzer can help you analyze your operating system's security configuration for changes during software installation.
MIT License
2.68k stars 271 forks source link

EF Core Database Backend #519

Open gfs opened 4 years ago

gfs commented 4 years ago

Experiment with EF Core for access instead of SQlite directly: https://docs.microsoft.com/en-us/ef/core/

ghost commented 1 year ago

@gfs Hello, I was looking at this tool a bit. Looks like the issue has been open for a couple years. Would this be something I could contribute too? I will have to look into the tool a bit more.

gfs commented 1 year ago

Hi @mtmulch,

If you're interested we are open to contributions.

You can start by looking at the DatabaseManager abstract class. The rest of the code is backend agnostic because of this abstraction:

https://github.com/microsoft/AttackSurfaceAnalyzer/blob/release/v2.3/Lib/Utils/DatabaseManager.cs

You should also take a look at the SqliteDatabaseManager class which is the only shipped implementation which inherits from it. This implementation actually is backed by multiple sqlite database "shards" to improve performance.

https://github.com/microsoft/AttackSurfaceAnalyzer/blob/release/v2.3/Lib/Utils/SqliteDatabaseManager.cs

And then creating a new EntityFrameworkDatabaseManager that inherits from DatabaseManager.

I don't believe you need to modify the SqliteDatabaseManager, the DatabaseManager class or much of the rest of the code.

The database manager itself gets instantiated here:

https://github.com/microsoft/AttackSurfaceAnalyzer/blob/1e7d026c1429a6486099d7a274e1268c414f6605/Cli/AttackSurfaceAnalyzerClient.cs#L421

I'd go with adding an argument (or multiple if needed) to the CommandOptions class to specify which database backend to use - leaving the default behavior of using the SQLite backend the same for now. This should end up getting consumed by the line above where the specific manager is instantiated.

https://github.com/microsoft/AttackSurfaceAnalyzer/blob/1e7d026c1429a6486099d7a274e1268c414f6605/Lib/Objects/CommandOptions.cs#L136

You can test your implementation against the direct SQLite implementation with the Tests and Benchmarks in the repo.

gfs commented 1 year ago

Also to note, you should work from The release/v2.3 branch and not main.

ghost commented 1 year ago

@gfs Thanks for the guidance. I just wanted to confirm the entire project is targeting .NET 6.0 and EF Core with the latest stable version of 6.0.7 would be a good overall fit before proceeding further with the implementation.

gfs commented 1 year ago

Yes. Attack Surface Analyzer only targets the latest .NET, currently 6.0 so you should be able to use the latest EF.

ghost commented 1 year ago

Is their any plans to introduce another ORM such as NHibernate for Data Access? It may be thinking a little to far ahead and introducing more code than you would like for this issue but it almost seemed logical to use a factory.

gfs commented 1 year ago

I've experimented a bit with LiteDB but couldn't get the query speed competitive (you can see the partial implementation in the benchmarks).

I don't currently have any plans to develop an alternate backend right now.

Do you mean a factory to produce the correct database manager? I think that would be fine.

My understanding of EF is that it makes it easier to swap out where the data gets stored. Does it make sense to develop the EF manager in such a way that it itself is configurable with alternate storages?

ghost commented 1 year ago

Yes, I meant where the SqliteDatabaseManager is originally instantiated a factory can be instantiated and the method called to return the correct DatabaseManager basically following the same convention that you specified above. The CommandOptions default value can be tied to an enum which would by default point to SqliteDatabaseManager, allowing the user to just change the command argument to instantiate the EFDatabaseManager.

That's my understanding for ORM's such as EF to allow alternative data sources to be swapped in and out with ease without altering the SQLCommands associated with the specific database.

I have the project up and running. This is something I will work on in my free time and take my time with it, great project!

ghost commented 1 year ago

I've made some progress on this is in my own branch. Manual testing seems to have went well by just altering the default value of the type of DatabaseManager the user wants in the CommandOptions class. DbSettings I plan on keeping that the same as the EF Core ORM is being used as a replacement for the ADO.NET SqlCommands.

Sorry for the delay.

gfs commented 1 year ago

Thanks for the update. Look forward to reviewing.

ghost commented 1 year ago

@gfs sorry to leave you hanging on this I will get around to it, account problems. :)

gfs commented 1 year ago

No problem at all. Looks like good progress so far.

ghost commented 1 year ago

[@gfs]These five statements can be seeded with EF on application load. However I cannot fully remember or state for sure if the same tables will be used if the tables are seeded prior to loading with EF using the SQL statement due to the camel_case table naming convention. Will have to test.

image

private const string SQL_CREATE_COLLECT_RESULTS = "create table if not exists collect (run_id text, result_type text, identity text, row_key blob, timestamp text, serialized blob, UNIQUE(run_id, identity))"; private const string SQL_CREATE_FINDINGS_RESULTS = "create table if not exists findings (first_run_id text, second_run_id text, analyses_hash text, level int, result_type int, identity text, first_serialized text, second_serialized text, meta_serialized text)"; private const string SQL_CREATE_PERSISTED_SETTINGS = "create table if not exists persisted_settings (id text, serialized text, unique(id))"; private const string SQL_CREATE_RESULTS = "create table if not exists results (base_run_id text, compare_run_id text, analyses_hash text, status text);"; private const string SQL_CREATE_RUNS = "create table if not exists runs (run_id text, type string, serialized blob, unique(run_id))";

gfs commented 1 year ago

I took a quick look at the EF DB Manager. Looks like it's coming along well! Code seems pretty clear with the linq style queries.

ghost commented 1 year ago

Great I actually have sometime this week to get more done. Looking forward to PRing that!

gfs commented 1 year ago

Hi @WingZer0o are you still working on this?

I'm working right now on some changes to the SQLite manager to add sharding to the storage of compareresults to improve performance of comparison saving and use in the GUI.

ghost commented 1 year ago

Hi @gfs, honestly I totally forgot about this I got wrapped up in my other projects and work.

gfs commented 1 year ago

@WingZer0o no worries. If you'd like to contribute the work in progress, if you feel it would be helpful, I think it would be nice to eventually have this. I could make a EF Core dev branch for you to PR to and I may be able to continue the work from there - though I'm not sure of the timeline when I would get to it.

ghost commented 1 year ago

If I remember correctly I was pretty close to being done but I hadn't really tested it all that much. Let me know the branch and I can submit the PR to it.

gfs commented 1 year ago

Sounds great.

I branched off the head of release/v2.3 here: https://github.com/microsoft/AttackSurfaceAnalyzer/tree/proposal/EfCoreDbManager

ghost commented 1 year ago

It seems that I deleted my fork. I sincerely apologize.

gfs commented 1 year ago

Alas. Thanks for checking!

ghost commented 8 months ago

Still need some help with this? I've been working EF a lot and experiment with some Rust. I thought this was an interesting project.

gfs commented 8 months ago

@mtmulch Yeah, definitely still open to collaboration on this. Let me know if you have any questions, and just a reminder to fork off the release/2.3 branch rather than main.

ghost commented 8 months ago

Sounds good. I’ll make a branch sometime this week.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Gabe Stocco @.> Sent: Tuesday, October 31, 2023 11:54:08 AM To: microsoft/AttackSurfaceAnalyzer @.> Cc: Mike Mulchrone @.>; Mention @.> Subject: Re: [microsoft/AttackSurfaceAnalyzer] EF Core Database Backend (#519)

@mtmulchhttps://github.com/mtmulch Yeah, definitely still open to collaboration on this. Let me know if you have any questions, and just a reminder to fork off the release/2.3 branch rather than main.

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/AttackSurfaceAnalyzer/issues/519#issuecomment-1787502307, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BDIABV4QOBB2DPKBB6IFL6LYCENKBAVCNFSM4OWZX4M2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZYG42TAMRTGA3Q. You are receiving this because you were mentioned.Message ID: @.***>

WingZer0o commented 7 months ago

Sorry about the delay and small commits, I'll keep those a little more isolated so you don't get spammed. I have the factory I was originally talking about implemented. Back up and running.

WingZer0o commented 7 months ago

Some of these SQL consts will need to be utilized within the EntityFrameworkDatabaseManager class as well using, do you have a preference of putting all of these in a separate consts class or within the DatabaseManager abstract class?

gfs commented 7 months ago

At first blush, I think a shared consts class makes sense - or alternately an inheritance structure with an intermediate abstract sqldatabasemanager that's shared. 🤔

WingZer0o commented 7 months ago

I think the inheritance structure for DatabaseManager for the consts is a good location. I'll keep along that path this week. I have plenty of time these days.

Apologize about the deleted accounts and what not, show shy and embarrassing mistakes in an organization experiment, guess it don't even matter. 🤷

gfs commented 7 months ago

@WingZer0o sounds good!

No problem on the second aspect. It can certainly be intimidating to work publicly, my view is there is no shame in mistakes, we all make them and its a great way to learn! In any case, happy to have you revisiting this and thanks again for your collaboration and contributions here.

I have taken a quick look over some of the commits so far and its looking pretty clean so far, appreciating the small commits - easy to figure out what's done in each - exciting stuff! If you want me to take a closer look at anything in particular or have any other clarifying questions do feel free to reach out. If there's anything requiring a more detailed discussion we can switch to email or a teams call as well.

WingZer0o commented 7 months ago

Thanks for the feedback. I will certainly reach out with any clarifying questions should they arise. I pushed a few more things this evening. Converting these queries over is my first priority, I've been running the test suite as I go and at this current point in time only have 5 failing, so that's quite sound.

gfs commented 7 months ago

Awesome! Sounds like great progress!

WingZer0o commented 6 months ago

Ah shoot I forgot to ask an important EF question, when you are executing parallel queries are you expecting EF to operate in the same parallel fashion? Or, at least expecting operations to be asynchronous?

EF doesn't really support parallel operations on the same DbContext, its something that can be accounted before by generating additional context and maybe could turn into something like you have for SqlConnectionHolder but I imagine the point of introducing EF as an option was to escape the handling of such things

gfs commented 6 months ago

Background on how the current db manager works:

Indeed because sqlite doesn't support multi thread we instead create multiple databases, each with it own thread, and then the write method distributes work to the threads (the "shards" setting from the cli perspective). Each collected object's id is hashed and the modulus of the hash determines which shard the record is placed in. Then querying has to also take place across all shards. You can check the benchmarks section of the repo if interested but in my testing this was a significant collection speed speedup until about 7 databases.

I think it's safe for you to ignore the shards mechanic for now. The purpose is to improve collection and analysis speed, and it should be possible to retrofit later if necessary as it was to the initial implementatiom. Definitely simpler to not have to worry about the sharding at all, I think a large factor is just the number of bytes being written which I'm hoping efcore can reduce.

gfs commented 6 months ago

The sharding should only be an issue for sqlite as well - as if you're contacting a full sql database it'll handle threading on its own.

WingZer0o commented 6 months ago

Ah, excellent, I will proceed as is then. Seems I was able to get the remaining 5 test cases to run with Administrator mode.

WingZer0o commented 1 month ago

I'll make some time to finish this up. I still have some code on my local machine.