omkarcloud / botasaurus

The All in One Framework to build Awesome Scrapers.
https://www.omkar.cloud/botasaurus/
MIT License
1.36k stars 124 forks source link

Integrating with flask and handling routes #62

Closed gbopola closed 7 months ago

gbopola commented 7 months ago

Hi, I want to build a full stack application scraping yellow pages where the user can enter the yellow pages url that they want to scrape and get the scraped data. I'm having issues integrating flask and the api routes in the application using requests module. There is a conflict because the @request decorator is the same name as the requests module in Flask. Do you guys have any examples of how this can be done? Thanks in advance.

from botasaurus import *
from flask import Flask, jsonify, request
from scraper.yp_usa_scraper import *
from flask_cors import CORS

app = Flask(__name__)
CORS(app)

@app.route('/scrape/yp-usa', methods=["POST"])
@request(use_stealth=True)
def scrape_heading_task(request: AntiDetectRequests, data):
    data = request.get_json()
    response = request.get('https://www.yell.com/ucs/UcsSearchAction.do?scrambleSeed=1475848896&keywords=hairdressers&location=hatfield%2C+hertfordshire')
    return response.text

if __name__ == "__main__":
    # Run the Flask development server
    app.run(debug=True)
AttributeError: 'AntiDetectRequests' object has no attribute 'get_json'
Chetan11-dev commented 7 months ago

I recommend separating it in 2 functions as follows:

from botasaurus import *
from flask import Flask, jsonify, request
from scraper.yp_usa_scraper import *
from flask_cors import CORS

app = Flask(__name__)
CORS(app)

@request(use_stealth=True)
def scrape_heading_task(request: AntiDetectRequests, data):
    data = request.get_json()
    response = request.get('https://www.yell.com/ucs/UcsSearchAction.do?scrambleSeed=1475848896&keywords=hairdressers&location=hatfield%2C+hertfordshire')
    return response.text

@app.route('/scrape/yp-usa', methods=["POST"])
def scrape_yp(request):
    data = request.get_json()
    return scrape_heading_task(data )

if __name__ == "__main__":
    # Run the Flask development server
    app.run(debug=True)
gbopola commented 7 months ago

thank you :)