pkumza / LibRadar

LibRadar - A detecting tool for 3rd-party libraries in Android apps.
Apache License 2.0
262 stars 51 forks source link

Adjusting library definitions #30

Closed IzzySoft closed 7 years ago

IzzySoft commented 7 years ago

Hi pkumza,

two commits currently in this PR, more to come if you confirm what I'm doing.

e412320 is pretty clear: with my last PR it seems I forgot to also set pn on two of the entries. Guess that doesn't require any confirmation :smile_cat:

db8ed56 is the task I'm currently working on: retagging all those generic "da" (and only those) to something more speaking. Referring to your comment in #18, this is how I'm interpreting things:

Does that sound fine to you? Shall I go on?

To make things easier to identify, here are the libraries I've retagged from da to ut with db8ed56:

da;Appinventor;http://appinventor.mit.edu/
da;httpclientandroidlib;https://github.com/surespot/httpclientandroidlib
da;Logback;http://logback.qos.ch/
da;Bolts Base Library;https://github.com/BoltsFramework/Bolts-Android
da;Opencsv;https://github.com/EmergentOrder/opencsv
da;AppMachine App Generator;http://www.appmachine.com/

One I've retagged from da to ui: da;ActionBarSherlock;http://actionbarsherlock.com/. Other ones already in the queue but not yet committed I will list in another comment here – when I've done their tagging but before I commit and add them to the PR (so we easily can adjust them if you see an issue or feel I retagged them wrongly).

IzzySoft commented 7 years ago

Some more planned candidates:

da -> ut:

da;ActiveAndroid;https://github.com/pardom/ActiveAndroid
da;Adobe Air;http://www.adobe.com/cn/devnet/air/air_for_android.html
da;Adobe FlashPlayer;
da;Adobe Air FRE;http://www.adobe.com/cn/devnet/air/air_for_android.html
da;Adobe plugin;
da;Adobe XMP;http://www.adobe.com/products/xmp.html
da;Volley HTTP library;https://developer.android.com/training/volley/index.html
da;Androidplot;http://androidplot.com/
da;AndroidQuery;https://github.com/androidquery/androidquery
da;Corona;https://docs.coronalabs.com/native/android/html/com/ansca/corona/package-summary.html
da;Appmakr;https://www.appmakr.com/
da;Appyet;http://www.appyet.com/
da;Baidu APP SDK;https://developer.baidu.com/
da;baidu push service;https://developer.baidu.com/
da;StAX API;https://jcp.org/en/jsr/detail?id=173
da;biznessapps;https://www.biznessapps.com/
da;BugSense;https://github.com/wrapp/bugsense-android
da;Glide;https://github.com/bumptech/glide
da;CommonsWare MergeAdapter;https://github.com/commonsguy/cwac-merge
da;CommonsWare SackOfViewsAdapter;https://github.com/commonsguy/cwac-sacklist
da;com/deploygate;https://deploygate.com/
da;doapps;http://www.doapps.com/
da;Google GCM;https://developers.google.com/cloud-messaging/
da;Google Mobile Services;https://developers.google.com/android/reference/com/google/android/gms/package-summary
da;Google Android Net;http://developer.android.com/reference/android/net/package-summary.html
da;Google Appinventor;https://code.google.com/p/app-inventor-for-android/
da;Google Core Libraries for Java 6+;https://github.com/google/guava
da;Google GData;https://developers.google.com/gdata/

da -> ui:

da;AndroidAnimations Library;https://github.com/daimajia/AndroidViewAnimations

da -> ma:

da;comscore;https://www.comscore.com/ => ma;Comscore Analytics;https://www.comscore.com/
IzzySoft commented 7 years ago

OK, since I'm in a hurry to consolidate things on my end (need to keep results from two tools in sync: for those LibRadar doesn't identify, I've created a simple scanner based on the smali structures it temporarily creates via APKTool – which reminds me: that JAR should urgently be updated in LibRadar, the old one crashes on some .apk files meanwhile), I've completed the "retagging" (not yet committed). Leaves some questions open, as I a.o. stumbled upon a few things:

Finally, there are some das I didn't retag. You might wish to cross-check if some rather should. For things I have retagged, I've created separate pastebins as well. May I suggest you "clone" each, remove the lines you agree on, leave the ones I should revert back in, and then post the link to the clone in a comment (that is, if there's something in you don't agree with – might not be the case for the first 3)? Should be easy to revert back then with RegEx search and replace :smile:

If possible, please let me know your decision near-term. As I wrote, reverting is not too difficult ATM – but I then need to adjust "the other end" accordingly, to have the two in sync. With a bunch of other things in the queue here, I'd be happy to "commit" some pending stuff :stuck_out_tongue_winking_eye:

Thanks in advance!

pkumza commented 7 years ago

Woo, great job!

If you meet website like http://www.yixia.com/, you could notify me because I can read Chinese website. 😄

As I need "V2" branch's tagging rules urgently, could you please help me tag those libraries? I've tagged about 600 libs. They are rule-mappings from package name to library name. I can offer you a list of potential library package names and you can tag those with following format: tag_rules.csv

pkumza commented 7 years ago

I have created a new tool for V2 tagging. https://github.com/pkumza/LR_tagger

IzzySoft commented 7 years ago

If you meet website like http://www.yixia.com/, you could notify me because I can read Chinese website.

That was the reason I've explicitly mentioned it :innocent:

As I need "V2" branch's tagging rules urgently, could you please help me tag those libraries?

I might have a look into that – but not before this one is decided upon. Might sound like blackmail, but isn't: it won't be a great help if I tag them wrong, so I'd first need to know if I got it right. And apart from that, I'd prefer to have one task closed before starting at the next :stuck_out_tongue_winking_eye:

pkumza commented 7 years ago

I might have a look into that – but not before this one is decided upon. Might sound like blackmail, but isn't: it won't be a great help if I tag them wrong, so I'd first need to know if I got it right. And apart from that, I'd prefer to have one task closed before starting at the next

Exactly~ I'm working on it today, Monday is the first weekday for hard work. haha~

pkumza commented 7 years ago

https://www.umeng.com/component_update seems not exists any more. Information about Umeng update disappears. I prefer to keep it when I'm not sure.

However Umeng, a mobile analytics provider, is becoming more and more complex. It has many tools beyond analysis, like statistics, report, SSO, social network share, notification push, advertisement and many other functionalities. Therefore, I don't think it is suitable to take com/umeng as a single ma library. It should be divided into many libraries in future.

yixia is a video provider, so the library is probably a video player for Android. It can be taken as a ut lib.

da;javax;: Uh-Oh. Multiple projects here, e.g. javax/jmdns (jmDNS library)


Javax should never be a single lib. Too broad.

da;ISO Parser; should probably be ut;mp4parser;https://github.com/copiousfreetime/mp4parser da;Mozilla;https://www.mozilla.org/: quite a bit broad, isn't it: "org/mozilla"? they have e.g. org/mozilla/universalchardet da;kobjects;: that's a sub of da;KObjects;http://kobjects.org, should have the same pn da;Polidea;https://www.polidea.com/: Uh, careful! Multiple different projects, see https://github.com/Polidea (some pl.polidea., others com.polidea.)

Can't agree more.

these should be ad instead of da – definitely ad networks. this I'm sure is ma these are my candidates for UI elements

Nothing to say to these three links.

these I'm still thinking of da It is so hard to tell the diff between ut and ma, so I want to leave that in the future. *_*

I'd like to add a new type as Development Framework. Libs such as http://www.appmachine.com, https://www.appmakr.com, Google Firebase, Cordova should belongs to this type. Some apps are not using the lib, they are totally based on the framework.

IzzySoft commented 7 years ago

Therefore, I don't think it is suitable to take com/umeng as a single ma library.

I've checked the untagged "umeng" entries, and also my local definitions: Seems to cover what you wrote, so full ack on this and I leave it untouched :smile_cat:

yixia is a video provider, so the library is probably a video player for Android. It can be taken as a ut lib.

Done.

Javax should never be a single lib. Too broad.

Can't agree more. So in a separate run, should we first "untag" it (i.e. removing the "pn" and emptying the "lib") and then retag later when the "sub trees" can be identified correctly? And the same for "da;Mozilla" as well as "da;Polidea;"?

da;kobjects;: that's a sub of da;KObjects;http://kobjects.org, should have the same pn

Done. Still thinking whether that shouldn't be "ut"?

It is so hard to tell the diff between ut and ma

Oh, that one is easy. But you probably meant "ut and da", right? Agreed then. My suggestion on that is to leave "da" as the "broader term for things not yet categorized". In a sense, one could argue that even ma is a sub of da. So if a lib was identified and one's unsure where to put it (e.g. ui versus ut), da could be the fall-back covering both.

I'd like to add a new type as Development Framework. Libs such as http://www.appmachine.com, https://www.appmakr.com, Google Firebase, Cordova should belongs to this type. Some apps are not using the lib, they are totally based on the framework.

Sounds reasonable, and I'd support that (though things like Firebase are becoming a mixed pickle as well, being a DF at the same time as becoming MA and more). Still, would that be available in V1 as well? Shouldn't be much more than an additional entry in main/detect.py's definition array, right?

So now, you've been a bit too fast with already merging this – as most of the changes discussed have not yet been part of this PR. As I initially wrote, I just pushed an initial sample for you to investigate, and leave the remains pending your decision. So I will have to create another PR for the major bunch now :wink:

IzzySoft commented 7 years ago

I've added the relevant details (our 3 "too broad" candidates, the new DF tag) to the new PR as "things still open" so we don't lose track of them. All my currently "pending" changes have been committed to that PR, so you can go ahead and merge it. Be welcome to create 2 issues for the listed open points and assign them to me: One for the "too broad" (which we agreed on – pending your agreement on my proposed approach), and one for the DF as soon as you've implemented that code-wise.

pkumza commented 7 years ago

Oh, that one is easy. But you probably meant "ut and da", right? Agreed then. My suggestion on that is to leave "da" as the "broader term for things not yet categorized". In a sense, one could argue that even ma is a sub of da. So if a lib was identified and one's unsure where to put it (e.g. ui versus ut), da could be the fall-back covering both.

Reasonable. Seems I should convert so many libraries in V2 from da to ut. da is just a fall-back set.

pkumza commented 7 years ago

I've add an additional entry in main/detect.py in V1 branch.

IzzySoft commented 7 years ago

Reasonable. Seems I should convert so many libraries in V2 from da to ut. da is just a fall-back set.

Sounds like a good plan – and basically is what I did to V1.

I've add an additional entry in main/detect.py in V1 branch.

Cool. Will merge that back to my local repo ASAP and start retagging again – mostly from ut to df then, but also from the remaining das. Expect a new PR within, say, 48h 😸 (hopefully)

In that context, should the "basic" libraries like "Android Support v[4|7|13]" go to df as well then? After all, even the "ground framework" is a "development framework" in a sense. And I'd say df matches them better than da.