joaomdmoura / crewAI

Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
https://crewai.com
MIT License
16.96k stars 2.29k forks source link

Enable JavaScript and cookies to continue #672

Open gkzsolt opened 1 month ago

gkzsolt commented 1 month ago

Hi,

I (almost) finished yesterday your presentation course on Deeplearning.ai and I was impressed ;) My first try did not succeed, although.

I am just trying to get an agent to analyze a job posting and give a structured output of the requirements, like in L7_job_application_crew.ipynb from the presentation. I just copied the agent and task:

researcher = Agent(
    role="Tech Job Researcher",
    goal="Make sure to do amazing analysis on "
         "job posting to help job applicants",
    tools = [scrape_tool, search_tool],
    verbose=True,
    backstory=(
        "As a Job Researcher, your prowess in "
        "navigating and extracting critical "
        "information from job postings is unmatched."
        "Your skills help pinpoint the necessary "
        "qualifications and skills sought "
        "by employers, forming the foundation for "
        "effective application tailoring."
    )
)

analyze_task = Task(
    description=(
        "Analyze the job posting URL provided ({job_posting_url}) "
        "to extract key skills, experiences, and qualifications "
        "required. Use the tools to gather content and identify "
        "and categorize the requirements."
    ),
    expected_output=(
        "A structured list of job requirements, including necessary "
        "skills, qualifications, and experiences."
    ),
    agent=researcher,
    # async_execution=True
)

req_crew = Crew(
    agents = [researcher],
    tasks = [analyze_task],
    verbose = True,
    full_output = True
)

inputs = {
    'job_posting_url': 'https://hu.indeed.com/viewjob?jk=44678430abbc6f69&tk=1hufoopq6ojdt85p&from=serp&vjs=3',
}

But when running the crew, the output is:

> Entering new CrewAgentExecutor chain...
I should start by extracting the content of the job posting from the provided URL to analyze the key skills, experiences, and qualifications required.

Action: Read website content
Action Input: {"website_url": "https://hu.indeed.com/viewjob?jk=44678430abbc6f69&tk=1hufoopq6ojdt85p&from=serp&vjs=3"} 

Just a moment...Enable JavaScript and cookies to continue

Final Answer: Just a moment...Enable JavaScript and cookies to continue

> Finished chain.

Did it stuck when asked to enable Javascript and cookies?

gkzsolt commented 1 month ago

Looking at the code of ScrapeWebsiteTool, it does get stuck, indeed. It is a simple requests.get call. By the way, the site's content in question can be obtained even without enabling cookies, but there are also other problems: it redirects (301) and also has some primitive but effective scraping protection.

I tried scraping with Selenium. This worked in my home setup, but I was unable to make it work in a custom tool (SeleniumScrapingTool failed as well). Installing a webdriver compatible with your browser version seems to be very challenging. A few years ago, it was easy to download the webdriver for the most recent browsers (I am using Chrome), but starting from version 115, they discontinued it. Now, there is a webdriver manager expected to detect and download the driver for you, but I have never seen this work.

Has anybody managed to run the SeleniumScrapingTool successfully, and if so, could they share it with me? I would be very grateful. I like the agents crew idea and I'd like to contribute to it as well. I am on Ubuntu 22.04. Many thanks!