Closed dawnerd closed 1 year ago
Switched to running directly on my server and it seems to only double post when it think it's running for the first time, which on the server it was. Guessing even with caching something on the github actions side is making it think it hasn't run again. Where does it store info (if at all?) on what the last post was? Or is it just inferring it from the api?
Ah, it didn't even cross my mind running the bot through Github actions, nice!
I see you were running it with --skipChecks
so it should skip asking for an initial date and directly get the date of the last post of the Fedi account as the start date for gathering tweets.
If no posts are found in the Fedi account, it will get the last 2 days of tweets as a fallback:
https://github.com/robertoszek/pleroma-bot/blob/248f65d79cb11b10df96dbcec12dc46b6c6b2020/pleroma_bot/_pleroma.py#L49
The bot also creates some folders (users/<twitter_username>
) to keep track if it has been ran before for that Twitter user.
So I'm thinking perhaps you ran it to often the first time? (your cron seems to be configured to run every minute '0 * * * *'
)
Maybe your previous run didn't publish a post in time for the next run triggered by cron to get the date from the last published post on your Fediverse instance.
Cron should be for every hour, at least thats how it ran on github, though do wonder if it's just a timezone issue on the github runners - they are distributed after all. Does look like the users directory was cached, but no telling if it was cached correctly or fully.
I'll try to dig in some more when I get time next week
Ah, right! My bad, I read the cron expression wrong (as usual 😅). It's actually as you said, ran every hour (at minute 0).
So I'm wondering if the caching of the users folders it's the culprit then. I haven't delved too deep on Github actions so I can't really tell if the caching it's set up as it should. Perhaps listing the contents of a cached folder in the workflow to check what's inside is worth trying, just to double check it's doing what you expect it to.
In any case, posting your config (with any data you deem sensitive removed) and the log of the bot on verbose mode wouldn't hurt to see what's happening.
Oh, by the way. I've added some more checks that verify if a tweet is already mirrored on the Fediverse instance: https://github.com/robertoszek/pleroma-bot/commit/2681d0311e20d356eb2fc7244ee76a437e6b25da
They are included on 1.1.1rc42
:
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pleroma-bot==1.1.1rc42
But it also relies on the posts.json
file to do it, so if the caching on Github actions was your issue it won't help.
I forgot to ask, what version are you running?
Oh excellent, I'll give that a shot. I'm running 1.1.1rc40 btw
Haven't noticed anything double post so far so I'd say your fix worked. Thanks again for looking into it.
I'm running this in a weird round-about way but it seems to forget it's processed some accounts and reports
It seems like pleroma-bot is running for the first time for this Twitter user: Dollywood
Using github actions and Im caching posts.json and the users directory between runs.
In this example Dollywood had the same status posted before https://opencoaster.net/@Dollywood
Would love to help debug further, and if needed I can provide you access to the full config. Would verbose output help?