johncoleman83 / domain_scraper

Scrapes domains for broken links, emails & social media links (uses beautifulsoup)
MIT License
2 stars 3 forks source link

create one entry point for all scripts #1

Closed johncoleman83 closed 5 years ago

johncoleman83 commented 5 years ago

All these scripts have shared functions. Can we create 1 entry point for all the scripts.

edvein-rin commented 5 years ago

I can try to do something.

johncoleman83 commented 5 years ago

Thanks @edikxl let me know if you need help. The docs though for each script are pretty good

johncoleman83 commented 5 years ago

You can separate this into different PR’s. The first step is easy, the second step will require more work and you can either do that separately or leave it for someone else

edvein-rin commented 5 years ago

OK

mrvnmchm commented 5 years ago

I'll take at the second step. :+1:

johncoleman83 commented 5 years ago

Hi @mrvnmchm, if you still want a crack at this, I just merged @edikxl 's updates and am going to clean it up a bit. And there is still room for some major reorganizing. The system works fine, just could be organized better and functionalized as we have been discussing.

johncoleman83 commented 5 years ago

@edikxl, feel free to add your name to the README.md and any usage details you think will help. I did add a large comment in the file you made though with somewhat of a usage info.

mrvnmchm commented 5 years ago

Thanks @johncoleman83, I was waiting for that merge.

johncoleman83 commented 5 years ago

Thanks @mrvnmchm, I started modularizing the main app, but stopped after the building module for the error check and storage write.

mrvnmchm commented 5 years ago

Completed building the modules, and forming the arguments and help. Working on execution and test. Here's the help so far:

(domainScraper-cQAT3a2w) mrvnmchm@M3-Q-X70-A:/some_folder/domain_scraper$ python domain_scraper.py -h
usage: domain_scraper [-h] [--check [CHECK]] [--extract [EXTRACT]]
                      [--scrape [SCRAPE]] [--scrape-n [SCRAPE_N]]
                      [--all [ALL]]
                      [input_file]

Scrapes domains from one input URL or from a file list of domains for broken
links, valid emails, and valid social media links.

positional arguments:
  input_file            Indicate the input file to scrape.

optional arguments:
  -h, --help            show this help message and exit
  --check [CHECK]       Find broken links from urls in file.
  --extract [EXTRACT]   Extract name from emails in file.
  --scrape [SCRAPE]     Scrape emails and social media urls from file.
  --scrape-n [SCRAPE_N]
                        Scrape emails and social media urls from file with new
                        links.
  --all [ALL]           Perform all actions on urls from file with links.
johncoleman83 commented 5 years ago

Looks super nice, don't forget to pull latest master, I fixed a slight bug in the original entry point file in the latest commit.

mrvnmchm commented 5 years ago

@johncoleman83, PR #5 set for review. Thanks for letting me help, let me know if there is anything I need to change.

johncoleman83 commented 5 years ago

This was completed by @mrvnmchm.