Open gfs opened 4 years ago
@gfs Hello, I was looking at this tool a bit. Looks like the issue has been open for a couple years. Would this be something I could contribute too? I will have to look into the tool a bit more.
Hi @mtmulch,
If you're interested we are open to contributions.
You can start by looking at the DatabaseManager abstract class. The rest of the code is backend agnostic because of this abstraction:
https://github.com/microsoft/AttackSurfaceAnalyzer/blob/release/v2.3/Lib/Utils/DatabaseManager.cs
You should also take a look at the SqliteDatabaseManager class which is the only shipped implementation which inherits from it. This implementation actually is backed by multiple sqlite database "shards" to improve performance.
And then creating a new EntityFrameworkDatabaseManager that inherits from DatabaseManager.
I don't believe you need to modify the SqliteDatabaseManager, the DatabaseManager class or much of the rest of the code.
The database manager itself gets instantiated here:
I'd go with adding an argument (or multiple if needed) to the CommandOptions class to specify which database backend to use - leaving the default behavior of using the SQLite backend the same for now. This should end up getting consumed by the line above where the specific manager is instantiated.
You can test your implementation against the direct SQLite implementation with the Tests and Benchmarks in the repo.
Also to note, you should work from The release/v2.3 branch and not main.
@gfs Thanks for the guidance. I just wanted to confirm the entire project is targeting .NET 6.0 and EF Core with the latest stable version of 6.0.7 would be a good overall fit before proceeding further with the implementation.
Yes. Attack Surface Analyzer only targets the latest .NET, currently 6.0 so you should be able to use the latest EF.
Is their any plans to introduce another ORM such as NHibernate for Data Access? It may be thinking a little to far ahead and introducing more code than you would like for this issue but it almost seemed logical to use a factory.
I've experimented a bit with LiteDB but couldn't get the query speed competitive (you can see the partial implementation in the benchmarks).
I don't currently have any plans to develop an alternate backend right now.
Do you mean a factory to produce the correct database manager? I think that would be fine.
My understanding of EF is that it makes it easier to swap out where the data gets stored. Does it make sense to develop the EF manager in such a way that it itself is configurable with alternate storages?
Yes, I meant where the SqliteDatabaseManager is originally instantiated a factory can be instantiated and the method called to return the correct DatabaseManager basically following the same convention that you specified above. The CommandOptions default value can be tied to an enum which would by default point to SqliteDatabaseManager, allowing the user to just change the command argument to instantiate the EFDatabaseManager.
That's my understanding for ORM's such as EF to allow alternative data sources to be swapped in and out with ease without altering the SQLCommands associated with the specific database.
I have the project up and running. This is something I will work on in my free time and take my time with it, great project!
I've made some progress on this is in my own branch. Manual testing seems to have went well by just altering the default value of the type of DatabaseManager the user wants in the CommandOptions class. DbSettings I plan on keeping that the same as the EF Core ORM is being used as a replacement for the ADO.NET SqlCommands.
Sorry for the delay.
Thanks for the update. Look forward to reviewing.
@gfs sorry to leave you hanging on this I will get around to it, account problems. :)
No problem at all. Looks like good progress so far.
[@gfs]These five statements can be seeded with EF on application load. However I cannot fully remember or state for sure if the same tables will be used if the tables are seeded prior to loading with EF using the SQL statement due to the camel_case table naming convention. Will have to test.
private const string SQL_CREATE_COLLECT_RESULTS = "create table if not exists collect (run_id text, result_type text, identity text, row_key blob, timestamp text, serialized blob, UNIQUE(run_id, identity))"; private const string SQL_CREATE_FINDINGS_RESULTS = "create table if not exists findings (first_run_id text, second_run_id text, analyses_hash text, level int, result_type int, identity text, first_serialized text, second_serialized text, meta_serialized text)"; private const string SQL_CREATE_PERSISTED_SETTINGS = "create table if not exists persisted_settings (id text, serialized text, unique(id))"; private const string SQL_CREATE_RESULTS = "create table if not exists results (base_run_id text, compare_run_id text, analyses_hash text, status text);"; private const string SQL_CREATE_RUNS = "create table if not exists runs (run_id text, type string, serialized blob, unique(run_id))";
I took a quick look at the EF DB Manager. Looks like it's coming along well! Code seems pretty clear with the linq style queries.
Great I actually have sometime this week to get more done. Looking forward to PRing that!
Hi @WingZer0o are you still working on this?
I'm working right now on some changes to the SQLite manager to add sharding to the storage of compareresults to improve performance of comparison saving and use in the GUI.
Hi @gfs, honestly I totally forgot about this I got wrapped up in my other projects and work.
@WingZer0o no worries. If you'd like to contribute the work in progress, if you feel it would be helpful, I think it would be nice to eventually have this. I could make a EF Core dev branch for you to PR to and I may be able to continue the work from there - though I'm not sure of the timeline when I would get to it.
If I remember correctly I was pretty close to being done but I hadn't really tested it all that much. Let me know the branch and I can submit the PR to it.
Sounds great.
I branched off the head of release/v2.3 here: https://github.com/microsoft/AttackSurfaceAnalyzer/tree/proposal/EfCoreDbManager
It seems that I deleted my fork. I sincerely apologize.
Alas. Thanks for checking!
Still need some help with this? I've been working EF a lot and experiment with some Rust. I thought this was an interesting project.
@mtmulch Yeah, definitely still open to collaboration on this. Let me know if you have any questions, and just a reminder to fork off the release/2.3 branch rather than main.
Sounds good. I’ll make a branch sometime this week.
Get Outlook for iOShttps://aka.ms/o0ukef
From: Gabe Stocco @.> Sent: Tuesday, October 31, 2023 11:54:08 AM To: microsoft/AttackSurfaceAnalyzer @.> Cc: Mike Mulchrone @.>; Mention @.> Subject: Re: [microsoft/AttackSurfaceAnalyzer] EF Core Database Backend (#519)
@mtmulchhttps://github.com/mtmulch Yeah, definitely still open to collaboration on this. Let me know if you have any questions, and just a reminder to fork off the release/2.3 branch rather than main.
— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/AttackSurfaceAnalyzer/issues/519#issuecomment-1787502307, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BDIABV4QOBB2DPKBB6IFL6LYCENKBAVCNFSM4OWZX4M2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZYG42TAMRTGA3Q. You are receiving this because you were mentioned.Message ID: @.***>
Sorry about the delay and small commits, I'll keep those a little more isolated so you don't get spammed. I have the factory I was originally talking about implemented. Back up and running.
Some of these SQL consts will need to be utilized within the EntityFrameworkDatabaseManager
class as well using, do you have a preference of putting all of these in a separate consts class or within the DatabaseManager abstract class?
At first blush, I think a shared consts class makes sense - or alternately an inheritance structure with an intermediate abstract sqldatabasemanager that's shared. 🤔
I think the inheritance structure for DatabaseManager
for the consts is a good location. I'll keep along that path this week. I have plenty of time these days.
Apologize about the deleted accounts and what not, show shy and embarrassing mistakes in an organization experiment, guess it don't even matter. 🤷
@WingZer0o sounds good!
No problem on the second aspect. It can certainly be intimidating to work publicly, my view is there is no shame in mistakes, we all make them and its a great way to learn! In any case, happy to have you revisiting this and thanks again for your collaboration and contributions here.
I have taken a quick look over some of the commits so far and its looking pretty clean so far, appreciating the small commits - easy to figure out what's done in each - exciting stuff! If you want me to take a closer look at anything in particular or have any other clarifying questions do feel free to reach out. If there's anything requiring a more detailed discussion we can switch to email or a teams call as well.
Thanks for the feedback. I will certainly reach out with any clarifying questions should they arise. I pushed a few more things this evening. Converting these queries over is my first priority, I've been running the test suite as I go and at this current point in time only have 5 failing, so that's quite sound.
Awesome! Sounds like great progress!
Ah shoot I forgot to ask an important EF question, when you are executing parallel queries are you expecting EF to operate in the same parallel fashion? Or, at least expecting operations to be asynchronous?
EF doesn't really support parallel operations on the same DbContext, its something that can be accounted before by generating additional context and maybe could turn into something like you have for SqlConnectionHolder
but I imagine the point of introducing EF as an option was to escape the handling of such things
Background on how the current db manager works:
Indeed because sqlite doesn't support multi thread we instead create multiple databases, each with it own thread, and then the write method distributes work to the threads (the "shards" setting from the cli perspective). Each collected object's id is hashed and the modulus of the hash determines which shard the record is placed in. Then querying has to also take place across all shards. You can check the benchmarks section of the repo if interested but in my testing this was a significant collection speed speedup until about 7 databases.
I think it's safe for you to ignore the shards mechanic for now. The purpose is to improve collection and analysis speed, and it should be possible to retrofit later if necessary as it was to the initial implementatiom. Definitely simpler to not have to worry about the sharding at all, I think a large factor is just the number of bytes being written which I'm hoping efcore can reduce.
The sharding should only be an issue for sqlite as well - as if you're contacting a full sql database it'll handle threading on its own.
Ah, excellent, I will proceed as is then. Seems I was able to get the remaining 5 test cases to run with Administrator mode.
I'll make some time to finish this up. I still have some code on my local machine.
Experiment with EF Core for access instead of SQlite directly: https://docs.microsoft.com/en-us/ef/core/