naim94a / lumen

A private Lumina server for IDA Pro
https://lumen.abda.nl/
MIT License
899 stars 102 forks source link

Separate databases per library and program. #15

Open ThisIsMyAltAccount opened 3 years ago

ThisIsMyAltAccount commented 3 years ago

Depending on how much metadata you have pushed into the database you can get the wrong results when pulling. For example, when pulling metadata for IDA's QT5Gui.dll I get metadata for CryptoPP and 7-zip that I have uploaded into my database the past week.

If each library and program has its own database, I could compile QT5 and upload it into the QT5 database. Since I'm decompiling IDA's QT5Gui.dll I could select the QT5 database and pull metadata from it, without the possibility of getting metadata from unrelated programs and libraries.

Maybe even create separate databases per OS/Architecture, maybe even compiler versions:

windows/x86/qt5.sql
windows/x64/qt5.sql
linux/arm64/qt5.sql
linux/x86/gcc-6.4.0/qt5.sql
linux/x86/gcc-6.5.0/qt5.sql

As far as switching between databases, I have no idea how it would work.

naim94a commented 3 years ago

The protocol doesn't specify the file hash when pulling metadata, so you would have to switch databases manually to accomplish that. You could add dbname=x86_qt5 parameter to the connection string (connection_info) in order to select a different database. That database should have schema.sql applied to it too...

Note that when lumina identifies functions from multiple files, it's because they have the same functions. The whole point of lumina is to make detection faster while reversing. The optimal solution would be to select a more general name for the function, or maybe increase IDA's LUMINA_MIN_FUNC_SIZE

AGG2017 commented 3 years ago

I think it is not hard to be done by creating a simple Python plugin that when activated, after loading new IDA database or anytime later, to read ida.cfg to get lumina server information and then to comunicate with the lumina server. With custom commands it can read all available databases and to give you a menu to select the one you want to use from now on, or to create a new database. You can always reactivate this plugin to switch to another lumina database when needed. I did something similar replicating all IDA lumina functions but able to work with my custom processor modules that are not supported by the internal IDA Lumina.

naim94a commented 3 years ago

I think that it would require hooking a few IDA functions... What if someone hits "pull metadata", how would you select the correct database on the server without modifying the protocol on IDA's side?

AGG2017 commented 3 years ago

I'm talking only for one private server for just one user. Different users can be detected by their current IP and their last selected database. If found there is no information about a specific IP, the unknown users will all use one default database or something like that.

naim94a commented 3 years ago

I personally have multiple databases open simultaneously on the same PC... But for private use, I guess a lock IP on file md5 could be added to the http API

AGG2017 commented 3 years ago

During the hello communication I see there is a license information with user name and email. Enough to detect the user properly. The leaking licenses can be entered in a database and served by IP. The Lumina server is a great idea but at this stage of implementation it is just the beginning and I'm still not interested by the original one. Restricted to a few processor modules and no options to extend anything, the only options I had is to replicate everything from scratch. Now I can do everything and adding something like users and databases can be done for minutes. I found that having different databases for the versions of each project is very useful. Even the function may be the same for several of them, the comments can be very specific for each one.

naim94a commented 3 years ago

The license isn't enough unfortunately... A company using floating licenses would result in identical hello messages from all clients (IP is more unique). If you're interested in sharing databases with version control, and not sharing function signatures across databases, why not use something like https://github.com/idarlingteam/idarling ? It sounds like they accomplish something similar to what you're describing...

AGG2017 commented 3 years ago

I know about this project but it is not what I needed. In fact I started from Diaphora project and connected the local database file with a remote connection to a server with MySQL database. In Diaphora they keep the clean assembly of the function and many many other unique function parameters (not only hashes) in order to be able to find not only the best match but also close enough functions to the unknown one (in my case modified function from the previous version of the same firmware). So, now I have access to the exact match for all known functions and the best guess for the rest with ability to choose manually if more than one is close enough. For just one user I can collect all I needed. For public servers they have many limits what to collect and that restricts a lot the end result. Thanks for your efforts to make publicly available a private Lumina server. It is a great project. I installed one and everything is working fine but I needed much more than the original idea behind it. Unfortunately, no enough time to make it universal enough for public release.

naim94a commented 3 years ago

Using Diaphora seems like a really cool idea! I'd like to see your project if you every decide to publicly release it. Hopefully HR will add more features to the protocol that would help resolve these issues (Or we can define our own extensions to the protocol and hook parts of IDA)

Thanks for using Lumen, It's nice to see people use your work :)