Closed IzzySoft closed 7 years ago
One more candidate: AndroidProxySetter uses android-proxy which is not detected.
If you could let me know (or include with the README) how to obtain the values for dn, bh, btc and btn, I (and others) could try to add them to data/tgst5.dat
. If additional steps are required for other files (e.g. data/new_dict.dat
or permission/tagged_dict.txt
), it would be nice to know as well.
Sorry for the late reply, I was very busy this days. new_dict.dat is the dictionary for API. the calculation is simple, stupid but easy to use.
for each api:
b_hash = (b_hash + API_ID * API_NUM) % 999983
999983 is a large prime number. LibRadar use (b_hash, b_total_num, b_total_call) as an identifier. I am not sure that is right in mathematics proof but it works in most cases.
Sorry for the late reply, I was very busy this days.
Yupp, thought so. We're all having more than one task at hand, so no worries it it takes a few days (a short note when estimates say it might take longer is welcome anytime – but I never expect immediate response for things that are not urgent. It's a hobby – and none of my recorded issues was about a "show breaker bug" ;)
Not being deeply involved with the technical details behind the library definitions, I unfortunately didn't understand your pointers here. Is there a short step-by-step instruction? Something along the lines of:
apktool d foo.apk
cd
to the directory the supposed lib is found in (for com/some/lib
that's the directory where com
resides)foo com/some/lib
to calculate bar
I take it in your code snippet, b_hash
is what becomes bh
. But it's unclear to me where API_ID
and API_NUM
come from, how b_hash
must be initialized before the loop – and what happened to dn
, btc
and btn
(or rather how to obtain their values).
Of course, I could always report my findings on "app uses lib " and have you do the work. But sometimes, I'd rather have that finding listed with the app before I forget what I need to rescan :)
LibRadar use (b_hash, b_total_num, b_total_call) as an identifier.
Ah yes, that's what bh, btn and btc stand for :) Still would need to know how to find those values, given a single APK using such an "unidentified library" :)
All right. Here's the specific introduction and I want to make it simple but clear.
apktool d foo.apk
cd
to the directory the supposed lib is found in (for com/some/lib
that's the directory where com
resides)com/some/lib
. In this case, btn is 3 because three API types appears and btc is 7 as 2+4+1=7
.for each api: b_hash = b_hash + API_ID * API_NUM) % 999983
to calculate b_hash
.This is the way to calculate bh btc btn. dn
stands for repetitions, so it is not calculated in this stage as we did't know if a sub-package is a library. I've got a lot of (bh, btc, btn) tuples, after that, we could cluster them into groups. dn
stands for the size of a group.
if you want to add new library into the database, that means you've make sure that com/som/lib
is a library. In this case, we don't need to get dn
, just put bh, btc, btn, the library name and other information into database.
Thanks! But sorry if I might sound dumb: I get it as far as to step 3 (which is, if I uderstand correctly, what's the first part of permission/tag_dict.py
. But I'm lost with step 4: I assume "every corner" applies to the unpacked foo.apk
. "for every API" applies to what? To the hash map? And while in step 5 I now have the base for b_hash
, I'm still confused concerning API_ID * API_NUM
.
As I doubt you've been doing those steps manually for the many libraries already listed: Don't you have some script you've used for that, which I'd run at step 3 and it does 3-5 and then spits out the line to be added (to new_dict.dat
I assume)? And in the end, don't I have to run permission/tag_dict.py
to tag the new entries? And then, how to get them to data/tgst5.dat
?
Search the byte code in smali
file for some strings that matches the API.
For example, you found a string "Landroid/widget/OverScroller;->getCurrVelocity(" in a smali
file and this string matches
{"key": "Landroid/widget/OverScroller;->getCurrVelocity(", "value": 12}
in new_dict.dat
, this means we need add 1 on API_ID-12.
If this is the first time to match a string "Landroid/widget/OverScroller;->getCurrVelocity(", the dict in step 4 should be {12:1}.
If we found a string "Landroid/view/ViewGroup;->onKeyDown(" later, the dict in step 4 should be {12:1, 258:1}.
If we found string "Landroid/widget/OverScroller;->getCurrVelocity(" again, the dict in step 4 should be {12:2, 258:1}.
After the whole package were scanned, we got a dict.
For dict {12:2, 258:1}, b_hash should equal to (0+12*2+258*1)%999983
PS:
There's no doubt that I did those steps with scripts but that was research things. Code were like patches and patches.
Scripts also need to have access to database, which makes the scripts very hard to use. As adding a new library does not need many steps like clustering, there's no need to use those scripts in chaos.
In fact, I got many API candidates that some of them were not actual Android API. So I deleted wrong ones and calculate hash number again and again.
Woah. That would mean scanning everything manually, transmitting findings manually by copy-pasting (error prone!), calculating manually (hoping to not having missed an entry)... I'd really like to add my findings – but sorry, that's much too time consuming – especially since results had to be checked multiple times and still leaving doubt one got it right. Some script would be highly appreciated here.
Also a bit unclear is which objects/lines should be counted. E.g. for the proxy example, in Smali I find a lot of strings like
sget-object v0, Lbe/shouldit/proxy/lib/reflection/android/ProxySetting;->NONE:Lbe/shouldit/proxy/lib/reflection/android/ProxySetting;
No opening parenthesis – so not to be matched? But
invoke-virtual {v0, v1}, Lbe/shouldit/proxy/lib/WiFiApConfig;->setProxyHost(Ljava/lang/String;)V
would be a valid candidate? So to find all possible candidates for my library "pn": "be/shouldit/proxy/lib"
, I'd cd
into the app's Smali directory (here: tk.elevenk.proxysetter_0.2/smali/tk
and run
grep -hRE "Lbe/shouldit/proxy/lib.+;-.+\(" *
resulting (in my case) in
invoke-virtual {v0, v1}, Lbe/shouldit/proxy/lib/WiFiApConfig;->setProxyHost(Ljava/lang/String;)V
invoke-virtual {v0, v1}, Lbe/shouldit/proxy/lib/WiFiApConfig;->setProxyPort(Ljava/lang/Integer;)V
invoke-virtual {v0, v1}, Lbe/shouldit/proxy/lib/WiFiApConfig;->setProxyExclusionString(Ljava/lang/String;)V
invoke-virtual {v0, v1}, Lbe/shouldit/proxy/lib/WiFiApConfig;->setProxySetting(Lbe/shouldit/proxy/lib/reflection/android/ProxySetting;)V
…
Then I'd need to strip off everything following the opening parenthesis plus everything before the Lbe/
and sorting the output, so make the command
grep -hRE "Lbe/shouldit/proxy/lib.+;-.+\(" * |awk -F "(" '{print $1 "("}' |awk -F "}," '{print $2}' | sort
Resulting lines now look like
Lbe/shouldit/proxy/lib/APL;->disableWifi(
Lbe/shouldit/proxy/lib/APL;->enableWifi(
Lbe/shouldit/proxy/lib/APL;->enableWifi(
Lbe/shouldit/proxy/lib/APL;->getConfiguredNetwork(
Lbe/shouldit/proxy/lib/APL;->getConfiguredNetwork(
Lbe/shouldit/proxy/lib/APL;->getConfiguredNetworks(
Lbe/shouldit/proxy/lib/APL;->getWiFiAPConfiguration(
Lbe/shouldit/proxy/lib/APL;->getWiFiAPConfiguration(
Lbe/shouldit/proxy/lib/APL;->getWifiManager(
Lbe/shouldit/proxy/lib/APL;->getWifiManager(
Lbe/shouldit/proxy/lib/APL;->setup(
Lbe/shouldit/proxy/lib/APL;->writeWifiAPConfig(
Lbe/shouldit/proxy/lib/enums/SecurityType;->equals(
Lbe/shouldit/proxy/lib/enums/SecurityType;->equals(
Lbe/shouldit/proxy/lib/enums/SecurityType;->name(
Lbe/shouldit/proxy/lib/enums/SecurityType;->toString(
Lbe/shouldit/proxy/lib/reflection/android/ProxySetting;->equals(
Lbe/shouldit/proxy/lib/reflection/android/ProxySetting;->equals(
Lbe/shouldit/proxy/lib/WiFiApConfig;->getProxyExclusionList(
Lbe/shouldit/proxy/lib/WiFiApConfig;->getProxyExclusionList(
Lbe/shouldit/proxy/lib/WiFiApConfig;->getProxyHost(
Lbe/shouldit/proxy/lib/WiFiApConfig;->getProxyPort(
Lbe/shouldit/proxy/lib/WiFiApConfig;->getProxySetting(
Lbe/shouldit/proxy/lib/WiFiApConfig;->getSecurityType(
Lbe/shouldit/proxy/lib/WiFiApConfig;->getSSID(
Lbe/shouldit/proxy/lib/WiFiApConfig;->setProxyExclusionString(
Lbe/shouldit/proxy/lib/WiFiApConfig;->setProxyHost(
Lbe/shouldit/proxy/lib/WiFiApConfig;->setProxyPort(
Lbe/shouldit/proxy/lib/WiFiApConfig;->setProxySetting(
which fits your description and already makes counting easier. I've already checked with new_dict.dat
that this library isn't yet there. So again I'm stuck as where to get the value
from: take the last line of new_dict.dat
and increase its value by 1? So after above first 3 lines, my dict would look like {99850:1, 99851:2}
– and what then? What needs to be added to new_dict.dat
(most likely "key": "Lbe/shouldit/proxy/lib/APL;->disableWifi(" "value": 99850}
etc), where does the value behind the colon go to, and how does the entire match come into tgst5.dat
?
Maybe it's easier if I instead submit the results of the last mentioned command (with some describing details), and you continue from there (as you've got routine to do that)? Once it is in tgst5.dat
, I could proceed adding the missing details (as I did with all the other libs).
Yes. I did this in Python script and used regex too, which is somewhat automatic. However adding a new library is still difficult. I would appreciate that if you could give me a list of libraries that do not appear in my database.
As "Lbe/shouldit/proxy/lib/APL;->disableWifi(" is a method that the code used, it does not mean that this string is a System API for Android. In my opinion, it comes from the code from another package and should not be recognized as an API. It's very easy to be obfuscated so I didn't put methods like this into new_dict.dat. Most of Android System API begins with "Landroid".
By the way, I will update the whole project and prepare to add automatically updating to database later this year (for my graduation project). More detailed information and functionality will be added.
However adding a new library is still difficult.
I definitely agree :)
I would appreciate that if you could give me a list of libraries that do not appear in my database.
Whenever I find any. Until now, that's the two mentioned above:
be/shouldit/proxy/lib
.qutelauncher uses Firebase Analytics (pn: com/google/firebase
). Corresponding findings from Smali:
Lcom/google/firebase/analytics/FirebaseAnalytics;->getInstance(
Lcom/google/firebase/messaging/FirebaseMessagingService;-><init>(
Lcom/google/firebase/messaging/RemoteMessage;->getData(
Lcom/google/firebase/messaging/RemoteMessage;->getData(
Lcom/google/firebase/messaging/RemoteMessage;->getFrom(
Lcom/google/firebase/messaging/RemoteMessage;->getNotification(
Lcom/google/firebase/messaging/RemoteMessage;->getNotification(
Lcom/google/firebase/messaging/RemoteMessage$Notification;->getBody(
com/google/android/gms
), which is not reported (seems to be used by Firebase (my guess would be those "RemoteMessage" calls), not by the app directly – and the resulting list would be pretty long; too long to be included here). Strange that it's not reported, as it is already known to LibRadar. If you want to check, grab the *full.apk
from behind the link.In my opinion, it comes from the code from another package
I doubt that, but I might be wrong: I've limited my grep
to just the application package directory itself (i.e. I did a cd tk.elevenk.proxysetter_0.2/smali/tk
first), to avoid having the library's own "inner calls" recorded along.
By the way, I will update the whole project and prepare to add automatically updating to database later this year (for my graduation project). More detailed information and functionality will be added.
That sounds great! Fingers already crossed for your graduation!
I will try to add this functionality before 4th March. After implementation, I'll send notification to your twitter. ^_^
Uh. Hadn't you closed this I'd said you simply could close it when done, so I get a notification from Github…
All right, issue reopened. ╮(╯▽╰)╭
I've added the function. I will make some test cases to make sure it works and add documents about how to add a new lib this week.
Thanks! Looks like soon it's time I try the new version then. OTOH, seeing the install instructions, I'm afraid it won't be that soon; it got too many dependencies. I prefer if things either come straight from the repositories, or run "out of their directory". Having to install self-compiled stuff via "make install" (Redis 3.2, as the repos only hold Redis 3.0) plus things via pip/pypi (which isn't installed itself even on my machine) is not my first choice ;)
Does it already return JSON the way it did before? Or did the format change?
I have an
.apk
where some library is not detected, though I know (from the project, and from decoding it withApktool
) it is there: version 0.04 of qutelauncher added Firebase Analytics – and accordingly, there'scom/google/firebase/*
in the Smali. I cannot find a match on that indata/tgst5.dat
, so it's not reported.So what would be the necessary steps in such a case? Plus, along those lines, is there a way to scan for undetected libraries one might not be aware of (one does not always know what to expect in some
.apk
– which is why one uses LibRadar in the first place :)?