hrenfroe / yahoo-groups-backup

A python script to backup the contents of private Yahoo! groups.
The Unlicense
3 stars 4 forks source link

KeyError ygData #6

Open mrmabs opened 4 years ago

mrmabs commented 4 years ago

I'm getting a strange repeatable error while trying to archive one particular group, same message every time.

I can navigate to the message via web and read it

Error messge:

Detecting the log-in page... Traceback (most recent call last): File "./yahoo-groups-backup.py", line 129, in <module> main() File "./yahoo-groups-backup.py", line 125, in main arguments, cfg_args) File "./yahoo-groups-backup.py", line 103, in invoke_subcommand return module.command(args) File "/home/mabs/src/yahoo-groups-backup/yahoo_groups_backup/subcommands/scrape_messages.py", line 50, in command msg = scraper.get_message(cur_message) File "/home/mabs/src/yahoo-groups-backup/yahoo_groups_backup/scraper.py", line 190, in get_message data['rawEmail'] = raw['ygData']['rawEmail'] KeyError: 'ygData'

json source of page causing error:

{"ygPerms":{"resourceCapabilityList":[{"resourceType":"GROUP","capabilities":[{"name":"READ"},{"name":"WELCOME_MSG"}]},{"resourceType":"PHOTO","capabilities":[{"name":"READ"},{"name":"UPLOAD"},{"name":"UPLOADTEMP"}]},{"resourceType":"FILE","capabilities":[{"name":"READ"},{"name":"CREATE"}]},{"resourceType":"MEMBER","capabilities":[{"name":"READ"}]},{"resourceType":"LINK","capabilities":[{"name":"CREATE"},{"name":"READ"}]},{"resourceType":"CALENDAR","capabilities":[{"name":"READ"}]},{"resourceType":"DATABASE","capabilities":[{"name":"READ"},{"name":"CREATE"},{"name":"READ_DATA"}]},{"resourceType":"POLL","capabilities":[{"name":"READ"},{"name":"VOTE"},{"name":"CREATE"}]},{"resourceType":"MESSAGE","capabilities":[{"name":"CREATE"},{"name":"READ"}]},{"resourceType":"PENDING_MESSAGE","capabilities":[]},{"resourceType":"ATTACHMENTS","capabilities":[{"name":"READ"}]},{"resourceType":"PHOTOMATIC_ALBUMS","capabilities":[{"name":"READ"},{"name":"UPLOAD"}]},{"resourceType":"MEMBERSHIP_TYPE","capabilities":[{"name":"READ"}]},{"resourceType":"POST","capabilities":[{"name":"READ"},{"name":"CREATE"}]},{"resourceType":"PIN","capabilities":[{"name":"DELETE"},{"name":"UPDATE"},{"name":"READ"},{"name":"CREATE"}]}],"subStatus":"NORMAL","groupUrl":"groups.yahoo.com","intlCode":"us"},"ygError":{"hostname":"gapi3.grp.bf1.yahoo.com","httpStatus":500,"errorMessage":"Internal error: Error during message fetch","errorCode":1001,"sid":"SID:YHOO:groups.yahoo.com:97349925365de61fafa749bb3f9f9418:0"}}

hrenfroe commented 4 years ago

Unfortunately this looks like an internal Yahoo issue keeping that particular message from being fetched. I'd try building a message record for it directly in the Mongo database so that the script can contine and you can keep the message. I can't provide specific guidance for that right now because I'm unfamiliar with the schema that the script expects. If you work it out, can you post back here?

mrmabs commented 4 years ago

So, a similar error happened on another email list, I forked the code and just put in an ignore, but that's not the right way to handle the error as it marks the message as missing in the database rather than unavailable.

The error seems to get thrown when an email is "unavailable", the one from this issue is actually fixed and now available, but the other list had about 20 in a row unavailable. I've manually deleted the entries from mongodb, so that the code can try and retrieve them at a later date, and I hope they come back at some stage. Otherwise I'll have another go at getting the code to just skip unavailable messages in the future (and put nothing in the database); and then running the archiver afterward will simply retry the unavailable messages.

JustinCEO commented 4 years ago

@mrmabs can you say what you did exactly re: changing the code? i'm a newbie and encountering KeyError: 'ygData' as well. thanks

SarahNathanson commented 4 years ago

Another mongodb newbie here looking for advice on skipping a message!