proegssilb / ferris-elf

GNU Affero General Public License v3.0
1 stars 3 forks source link

Automatically get correct answers from AOC website #41

Open proegssilb opened 11 months ago

proegssilb commented 11 months ago

The bot should have some way to automatically figure out what the correct answers are for a given day. Otherwise, the owner will have to do daily manual maintenance

proegssilb commented 11 months ago

This might have to wait until we can integrate https://github.com/scarvalhojr/aoc-cli/ or similar.

proegssilb commented 11 months ago

Sample post to https://adventofcode.com/2023/day/24/answer :

Request:

level=1&answer=42

Response:

<!DOCTYPE html>
<html lang="en-us">
<head>
<meta charset="utf-8"/>
<title>Day 24 - Advent of Code 2023</title>
<link rel="stylesheet" type="text/css" href="/static/style.css?31"/>
<link rel="stylesheet alternate" type="text/css" href="/static/highcontrast.css?1" title="High Contrast"/>
<link rel="shortcut icon" href="/favicon.png"/>
<script>window.addEventListener('click', function(e,s,r){if(e.target.nodeName==='CODE'&&e.detail===3){s=window.getSelection();s.removeAllRanges();r=document.createRange();r.selectNodeContents(e.target);s.addRange(r);}});</script>
</head><!-- snip, much gratitude to Eric, but that's entirely too many newlines -->
<body>
<!-- snip navbar-->

<!-- snip sidebar-->

<main>
<article><p>That's not the right answer.  If you're stuck, make sure you're using the full input data; there are also some general tips on the <a href="/2023/about">about page</a>, or you can ask for hints on the <a href="https://www.reddit.com/r/adventofcode/" target="_blank">subreddit</a>.  Please wait one minute before trying again. <a href="/2023/day/24">[Return to Day 24]</a></p></article>
</main>

<!-- snip ga -->
</body>
</html>

We can search for That's not the right answer. in main.article, and the POST request is easy enough to build.

ultrabear commented 11 months ago

Ive been thinking of making a pypi package that calls into the AOC api and handles their ratelimit stuff, the basic model is it would take a storage class that implements a trait (here in python land we call that Protocol or Abstract Base Class depending on semantics) to manage information about ratelimiting and submitted answers, and actually store the data we want asynchronously (because the AOC ratelimit is incredibly slow, there is no way we can request stuff and get it immediately)

proegssilb commented 11 months ago

I'm currently seeing a minute or two per submission, so I'm not all that worried about the rate limiting, but maybe I should be? We'll definitely want to cache as much data as we can, but let's get a PR up for my first attempt, and then figure out what the shortcomings are.

proegssilb commented 11 months ago

Additional request:

level=2&answer=23903579137

Additional response:

<!DOCTYPE html>
<html lang="en-us">
<head>
<meta charset="utf-8"/>
<title>Day 12 - Advent of Code 2023</title>
<link rel="stylesheet" type="text/css" href="/static/style.css?31"/>
<link rel="stylesheet alternate" type="text/css" href="/static/highcontrast.css?1" title="High Contrast"/>
<link rel="shortcut icon" href="/favicon.png"/>
<script>window.addEventListener('click', function(e,s,r){if(e.target.nodeName==='CODE'&&e.detail===3){s=window.getSelection();s.removeAllRanges();r=document.createRange();r.selectNodeContents(e.target);s.addRange(r);}});</script>
</head><!-- snip, much gratitude to Eric, but that's entirely too many newlines -->
<body>
<!-- snip navbar-->

<!-- snip sidebar-->

<main>
<article><p>That's the right answer!  You are <span class="day-success">one gold star</span> closer to restoring snow operations.</p><p>You have completed Day 12! You can <span class="share">[Share<span class="share-content">on
  <a href="https://twitter.com/intent/tweet?text=I+just+completed+%22Hot+Springs%22+%2D+Day+12+%2D+Advent+of+Code+2023&amp;url=https%3A%2F%2Fadventofcode%2Ecom%2F2023%2Fday%2F12&amp;related=ericwastl&amp;hashtags=AdventOfCode" target="_blank">Twitter</a>
  <a href="javascript:void(0);" onclick="var ms; try{ms=localStorage.getItem('mastodon.server')}finally{} if(typeof ms!=='string')ms=''; ms=prompt('Mastodon Server?',ms); if(typeof ms==='string' && ms.length){this.href='https://'+ms+'/share?text=I+just+completed+%22Hot+Springs%22+%2D+Day+12+%2D+Advent+of+Code+2023+%23AdventOfCode+https%3A%2F%2Fadventofcode%2Ecom%2F2023%2Fday%2F12';try{localStorage.setItem('mastodon.server',ms);}finally{}}else{return false;}" target="_blank">Mastodon</a
></span>]</span> this victory or <a href="/2023">[Return to Your Advent Calendar]</a>.</p></article>
</main>

<!-- snip ga -->
</body>
</html>

Looks like the magic search phrase is That's the right answer!.

ultrabear commented 11 months ago

I'm currently seeing a minute or two per submission, so I'm not all that worried about the rate limiting, but maybe I should be? We'll definitely want to cache as much data as we can, but let's get a PR up for my first attempt, and then figure out what the shortcomings are.

So, the thing is, AOC sources say any non user activity should limit itself to a single request of any kind every 15 minutes, I assume this means per session at least but its still a big limit

ultrabear commented 11 months ago

This post is from a mod on the official subreddit 2 years back, they might have removed this from the website itself but i haven't seen any updated guidance https://www.reddit.com/r/adventofcode/comments/ra741z/comment/hniznot/

proegssilb commented 11 months ago

This post is from a mod on the official subreddit 2 years back, they might have removed this from the website itself but i haven't seen any updated guidance https://www.reddit.com/r/adventofcode/comments/ra741z/comment/hniznot/

Google can't find that quote any more.

https://www.google.com/search?q=site%3Aadventofcode.com+avoid+sending+requests+more+often+than&sca_esv=594558718&hl=en&ei=_MyPZfL1J5KbptQP_dqViAw&ved=0ahUKEwjy4-jM17aDAxWSjYkEHX1tBcEQ4dUDCA8&uact=5&oq=site%3Aadventofcode.com+avoid+sending+requests+more+often+than&gs_lp=Egxnd3Mtd2l6LXNlcnAiPHNpdGU6YWR2ZW50b2Zjb2RlLmNvbSBhdm9pZCBzZW5kaW5nIHJlcXVlc3RzIG1vcmUgb2Z0ZW4gdGhhbkjRI1CVH1iVH3ACeACQAQCYAVKgAVKqAQExuAEDyAEA-AEC-AEB4gMEGAEgQYgGAQ&sclient=gws-wiz-serp

proegssilb commented 11 months ago

We should definitely rate limit and do some stuff to avoid hammering AOC with requests, but I'm not quite as certain there's a hard rule on request rates that's still documented on the AOC website, where it should be written down.

proegssilb commented 11 months ago

The automated submission in https://github.com/wimglenn/advent-of-code-data looks promising.

There's probably others. That's just what I found first.

ultrabear commented 11 months ago

found up to date guidance https://old.reddit.com/r/adventofcode/wiki/faqs/automation tl;dr:

If you absolutely must use an automated tool (e.g. web scraper) to acquire statistics from adventofcode.com, throttle it as per Eric's request:
    Please throttle your requests to at least once every few minutes.

As per the private leaderboard JSON API on adventofcode.com:
    Please don't make frequent automated requests to this service - avoid sending requests more often than once every 15 minutes (900 seconds).
ultrabear commented 11 months ago

We can interpret this as 3 minutes between requests for normal endpoints, as we dont touch private leaderboards, they also say to set a clear User-Agent that identifies our tool with a url and contact info

proegssilb commented 11 months ago

Outside of the fetch script (which runs once a day shortly after midnight, and makes one request per token), I think we're compliant with that by default for now simply by virtue of each benchmark run taking that long to run. That said, I'm sure we'll speed up the benchmarking runs in the future, and I'm not against adding a few safeties. Especially not if they're reasonably straightforward.