lapolis / LazyOSINT

Yet another OSINT automator.
GNU General Public License v3.0
18 stars 2 forks source link

Idea: pause/resume ability #2

Closed mcsJDF01 closed 3 years ago

mcsJDF01 commented 3 years ago

Love this tool!

occasionally I hit the google search captcha, rather than carrying on and erroring on google look ups, the script could pause at this point to either manually correct the captcha or change IPs and then resume.

Or even have a resume function built in to the tool so one can cancel it and then resume again at a later time

lapolis commented 3 years ago

Love this tool!

occasionally I hit the google search captcha, rather than carrying on and erroring on google look ups, the script could pause at this point to either manually correct the captcha or change IPs and then resume.

Or even have a resume function built in to the tool so one can cancel it and then resume again at a later time

Thanks, I really appreciate it.

The play/pause should not be a problem at all, maybe by adding an extra flag so if someone does not care about the google search it can just let it go. In a testing version I tried to implement the use of socks5 to change IP every time the recaptcha was hit, however, Google hates VPN providers, so I got the recaptcha on the first search each time. I think that for stopping and manually solve the recaptcha it would require the google search to be done with selenium, I think it will be even slower but I might try.

The checkpoint/resume function I am not sure about, I need to confirm whether LinkedIn keeps always the same order on the employee list (it does if I remember well); if it does, it should not be a problem to just do a checkpoint on the current page and add a flag to resume it.

mcsJDF01 commented 3 years ago

Good point, I don't know if LinkedIn uses the same order or not. I found I could change IPs (through a VPN provider) and then running the tool again appeared to search fine for a while, but ran in to the same issue after triggering the recaptcha. The problem was it was running the whole search again and so I wasn't getting any new data. (maybe that does imply LinkedIn use the same order (or at least for that session))

lapolis commented 3 years ago

Good point, I don't know if LinkedIn uses the same order or not. I found I could change IPs (through a VPN provider) and then running the tool again appeared to search fine for a while, but ran in to the same issue after triggering the recaptcha. The problem was it was running the whole search again and so I wasn't getting any new data. (maybe that does imply LinkedIn use the same order (or at least for that session))

Well, if using the VPN did not trigger the recaptcha for you I will be more than happy to implement the pause/restart. Also, may I ask which is your VPN provider?

mcsJDF01 commented 3 years ago

I was using protonvpn, just the free plan, more as a test really. It did eventually retrigger but this was after it had ran through more a number of other lookups.

lapolis commented 3 years ago

I was using protonvpn, just the free plan, more as a test really. It did eventually retrigger but this was after it had ran through more a number of other lookups.

Ok so probably Google did not have that IP blacklisted yet, make sense. Thanks.

lapolis commented 3 years ago

Hi there, I had a bit of spare time so I implemented the "pause" feature which can be used just by adding the flag -b. Before to resume the scan, the script will ask if you want to retry the same google query or discard it and go to the next LinkedIn member. Also, if you want to get a beep every time the script pauses just use -B instead :). I thought that with the sound you do not need to keep staring at the screen for the whole scan. To use the beep you need to install sox on your Kali.

sudo apt install sox

Since I did not implemented yet the "resume" function (I need to reformat more code than I thought), I did not merged the new version to the main, so if you want to use the new feature, for now just use the "develop_feature" branch like so:

git clone https://github.com/lapolis/LazyOSINT.git
cd LazyOSINT
git checkout develop_feature
./main -h
mcsJDF01 commented 3 years ago

That's awesome! thanks will test and let you know how I get on

lapolis commented 3 years ago

That's awesome! thanks will test and let you know how I get on

Hold on, I am almost done with the "resume" function. :D I will merge it with the main in few hours.

mcsJDF01 commented 3 years ago

OK will do :)

lapolis commented 3 years ago

OK will do :)

Taaa daaa! Done, hopefully it will even work :) I attach here some notes, I will update the README anyway before the end of the day.

LazyOSINT, while scraping LinkedIn, can now be brutally interrupted ( ^C ) at any time and then resumed by using the flag -r; the file containing the information necessary to resume the scan on the same page are NOT saved in the system /tmp so you can shut down your pc and carry on with the scan whenever you feel like. Note that when using the resume function, you need to specify the exact same LinkedIn url in the exact same format ( with -u ) otherwise it will not find the file.

The flag -S can be used if you want to skip the Google search for hidden profiles. The flag -b pauses the scan whenever Google hits a reCaptcha (a bit messy to resume if doing a full scan) while the flag -B also make a sound before pausing. If -b or -B is used together with -d, it is very likely that the screen log informing the user about the LinkedIn scraper pausing at the reCaptcha could go out of screen, in that case you just need to press enter to get it printed again or just answer the y/n question once you get use to. I will probably find better way to do it one day.

mcsJDF01 commented 3 years ago

This works a treat. All I did was switch IPs to a new VPN endpoint and it resumes. Great work. thank you so much!

lapolis commented 3 years ago

This works a treat. All I did was switch IPs to a new VPN endpoint and it resumes. Great work. thank you so much!

Glad to hear that. :)