kensanata / mastodon-archive

Archive your statuses, favorites and media using the Mastodon API (i.e. login required)
https://alexschroeder.ch/software/Mastodon_Archive
GNU General Public License v3.0
362 stars 33 forks source link

incremental runs of 'mastodon-archive' giving 404s on deleted mentions/replies on STDERR #92

Closed IzzySoft closed 1 year ago

IzzySoft commented 1 year ago

I'm running mastodon-archive archive --with-mentions --with-following "$myacc"¹ in a daily Cron job. Looks like 2 days ago someone deleted some "mentions". Since then, on every run, I get the following error:

Get user info
Indexing 868 statuses...
Indexing 419 favourites...
Indexing 36 bookmarks...
Indexing 871 mentions...
No replies in this archive...
Indexed 1908 statuses...
Counting missing replies...
Missing 120 originals...
Fetching |###############                 | 59/120('Mastodon API returned error', 404, 'Not Found', 'Not Found')
Fetching |################################| 120/120
96 urls in your backup (47 are previews)
Downloading |################################| 96/96
22 urls in your backup (10 are previews)
Downloading |################################| 22/22

The ('Mastodon API returned error', 404, 'Not Found', 'Not Found') is logged on STDERR and thus results in a mail from the Cron job.

Is there any way to clean that up, so the error goes away (otherwise I expect that the error line will grow with time) – without "losing data" (e.g. just "breaking the connection" to those 2 missing toots)?


¹ basically, contrib/mastoarch without archiving followers, so the error more likely comes from mastodon-archive replies "$myacc"

kensanata commented 1 year ago

I wonder. Looking at the code in question:

    if len(missing) > 0:
        if not "replies" in data:
            replies = []
        else:
            replies = data["replies"]

        bar = Bar('Fetching', max = len(missing))

        for id in missing:
            try:
                status = mastodon.status(id)
                replies.append(status)
            except Exception as e:
                if  "not found" in str(e):
                    pass
                else:
                    print(e, file=sys.stderr)

            bar.next()

        bar.finish()

        data["replies"] = replies
        core.save(status_file, data)

Is the problem that the case of the exception changed? The code ignores "not found" and in your example it says "Not Found".

>>> "not found" in "Not Found"
False

What happens if we change that:

            except Exception as e:
                if  "not found" in str(e) or "Not Found" in str(e):
                    pass
IzzySoft commented 1 year ago

That seems to do the trick:

Counting missing replies...
Missing 133 originals...
Fetching |################################| 133/133
kensanata commented 1 year ago

Fixed in 0f88336.

IzzySoft commented 1 year ago

Thanks!

lapineige commented 1 year ago

Strangely, this is not fixed now, and using pip I can't upgrade to that 1.4.3 version that seems to contains the fix.

lapineige commented 1 year ago

Oh wait, I don't have the issue while downloading, but when expiring !

kensanata commented 1 year ago

@lapineige How about you create a new issue? I'll take a look.