Anime scrapers is a collection of scrapers that have been all unified.
brew install python3
git clone https://github.com/jQwotos/anime_scrapers
cd anime_scrapers
pip install -r requirements.txt
anime_scrapers is the backend that is to be used by other applications. You can however use it directly if you want by using the python shell, but it's better to use an application.
Handlers are all classes, however each of them also have premade variables so you don't need to create a new object each time.
For example scraper_handler.py
has
class ScraperHandler():
and
var scraper_handler
search(query, limited_modules[])
resolve(link)
resolve(link)
{
'epNum': 'name of file',
'sources': [
'link': 'link',
'type': 'typically mp4 or iframe',
]
}
For information gathering, use the info_handler.py
. The functions are -
# strict is a boolean, which if True, searches for exact query only.
search(query, strict):
return [
{
'id': 'id of the show (int)',
'titles': 'Other names of the show (str)',
}
]
getDetailedInfo(id):
return [
{
'id': 'return the id from the parameter (int)',
'other-show-stuff': 'Other info related to show.
See anidb.py in info_collectors for example',
...
}
]
scrape_all_show_sources(link):
return {
'episodes': [
{
'epNumber', 'number as a string',
'sources', sourceType
}
],
'title': 'title of the show',
'status': 'status of the show',
'host': 'host such as gogoanime',
'released': 'year released as a string',
}
search(query):
return [
{
'link': 'link to the show',
'title': 'title of the show',
'language': 'sub or dub',
},
]
_scrape_video_sources(link):
return {
'epNum': 'episode number as a string',
'sources': sourceType
}
SourceTypes are in the following format.
[
{
'link': 'link to the mp4 or iframe embed',
'type': 'mp4 or iframe',
}
]
Want to add a downloader or scraper or info collector? Each module must have
Most functions will go through functions until there is a matching url schema. Each scraper contains the following variable which is used by the handler in order to identify the correct module to use when resolving links.
matching_urls = [
{
'urls': ['regex match expression'],
'function': function that should be called,
},
]
Scrapers handle search queries, scraping episodes from hosts and scraping sources from those episodes.
Refer to Functions for data formatting
Scrapers should have a couple of functions
search
scrape_all_show_sources
Optionally there can also be
_scrape_episode_source
Scrapers should be put into the scrapers
folder
Downloaders are what extract the direct link the the video file or download the file based off a filename.
Downloaders need these functions.
download(link, filename)
Downloaders should be put into the downloaders
folder
Information collectors collect various information about a particular anime series/movie.
They need these functions, which are mentioned in details above.
Info collectors should also have the following variables
matching_urls
Put them in the info_collectors
folder