micahflee / semiphemeral

Automatically delete your old tweets, except for the ones you want to keep
MIT License
884 stars 85 forks source link

Archive import #23

Closed russss closed 3 years ago

russss commented 5 years ago

This PR is a bit messy for reasons mentioned below. It's working fine for me so I'm going to raise this PR mostly so I don't forget about it.

This PR adds a feature to use a Twitter archive export to initially populate the semiphemeral DB. This has the following advantages:

You run it by requesting your archive, unzipping it, and then running semiphemeral import path/to/archive_dir.

Issues

This depends on #22.

KonradIT commented 5 years ago

This works beautifully, thanks @russss .

To get your twitter archive zip go here: https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive

wohali commented 5 years ago

Thanks for this PR. With this branch I was able to successfully import all of my tweets in, even those that radiergummi wasn't able to delete. semiphemeral delete (master branch) is now working its way through removing all those older tweets.

This adds more evidence for the theory that if you have a large 'hole' in your tweets, Twitter's API is not able to see past that hole to older tweets that may still be undeleted, meaneing semiphemeral can't access them without this kind of import.

wneuheisel commented 4 years ago

Tried this to import from archive, but got an error: File "/home/debian/.local/lib/python3.7/site-packages/semiphemeral/db.py", line 52, in init self.created_at = status.created_at AttributeError: 'Status' object has no attribute 'created_at'

tobiasziegler commented 4 years ago

Tried this to import from archive, but got an error: File "/home/debian/.local/lib/python3.7/site-packages/semiphemeral/db.py", line 52, in init self.created_at = status.created_at AttributeError: 'Status' object has no attribute 'created_at'

I just tried importing from archive and got the same error. It looks like an issue with parsing the JSON object for each tweet. In my tweet.js file, each object in the array has a 'tweet' attribute and all of the metadata is nested inside it.

I've modified line 502 in twitter.py to:

                t = Status.parse(self.api, tweet['tweet'])

That has got it working. I'm not sure whether there has been a recent change in the JSON structure for the tweet archives or if something else is going on.

Thanks for the work on this PR and for the tool itself.

wneuheisel commented 4 years ago

Tried this to import from archive, but got an error: File "/home/debian/.local/lib/python3.7/site-packages/semiphemeral/db.py", line 52, in init self.created_at = status.created_at AttributeError: 'Status' object has no attribute 'created_at'

I just tried importing from archive and got the same error. It looks like an issue with parsing the JSON object for each tweet. In my tweet.js file, each object in the array has a 'tweet' attribute and all of the metadata is nested inside it.

I've modified line 502 in twitter.py to:

                t = Status.parse(self.api, tweet['tweet'])

That has got it working. I'm not sure whether there has been a recent change in the JSON structure for the tweet archives or if something else is going on.

Thanks for the work on this PR and for the tool itself.

That worked for me. Thank you!

mattnworb commented 3 years ago

This is really great, thank you for the branch (and thanks for the semiphemeral project as well!).

For anyone trying to run this with recent commits, I resolved a few merge conflicts in this branch: https://github.com/mattnworb/semiphemeral/tree/archive-import-merged

kees commented 3 years ago

Can this PR get rebased please? It'd be nice to get this landed. :)

russss commented 3 years ago

Due to the nature of this feature, I don't really need it now, but I'm happy to rebase it and apply the above changes if I get some indication from @micahflee that it'll be merged.

kees commented 3 years ago

I can do the merge, but there's been a lot of code changes in the last few years ;)

russss commented 3 years ago

I can do the merge, but there's been a lot of code changes in the last few years ;)

Oops, I missed that you're a committer! I've just pushed a rebase onto the latest HEAD (also including encoding="UTF-8") but I'm waiting on Twitter to finish an export so I can actually test it.

I also note there's a related feature for DMs now. I guess it would be ideal to combine these features but I'm not sure I have the time/inclination to do so.

ctrlBIRDdelete commented 2 years ago
PS C:\Users\Alex\semiphemeral> python app.py import C:\Users\Alex\Downloads\archive\data
semiphemeral 0.7
Importing 35600 tweets from C:\Users\Alex\Downloads\archive\data
Traceback (most recent call last):
  File "C:\Users\Alex\semiphemeral\app.py", line 4, in <module>
    semiphemeral.main()
  File "C:\Users\Alex\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\Alex\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\click\core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "C:\Users\Alex\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\click\core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\Alex\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\click\core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\Alex\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\click\core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\Alex\semiphemeral\semiphemeral\__init__.py", line 81, in archive_import
    t.import_dump(path)
  File "C:\Users\Alex\semiphemeral\semiphemeral\twitter.py", line 800, in import_dump
    self.import_tweets(
AttributeError: 'Twitter' object has no attribute 'import_tweets'

Any idea why this happens whenever I'm trying to import the archive?

moffat commented 2 years ago

I see the same output as @ctrlBIRDdelete (although in macOS). Any ideas on how to fix?

ChawiBiker commented 2 years ago
PS C:\Users\Alex\semiphemeral> python app.py import C:\Users\Alex\Downloads\archive\data
semiphemeral 0.7
Importing 35600 tweets from C:\Users\Alex\Downloads\archive\data
Traceback (most recent call last):
  File "C:\Users\Alex\semiphemeral\app.py", line 4, in <module>
    semiphemeral.main()
  File "C:\Users\Alex\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\Alex\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\click\core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "C:\Users\Alex\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\click\core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\Alex\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\click\core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\Alex\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\click\core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\Alex\semiphemeral\semiphemeral\__init__.py", line 81, in archive_import
    t.import_dump(path)
  File "C:\Users\Alex\semiphemeral\semiphemeral\twitter.py", line 800, in import_dump
    self.import_tweets(
AttributeError: 'Twitter' object has no attribute 'import_tweets'

Any idea why this happens whenever I'm trying to import the archive?

I importer my archive, ran delete. Then downloaded a new version of my archive and wanted to import it again and got exactly your error. I simply restarted my MacBook and now it's running fine. Give it a try maybe :)