perkeep / perkeep

Perkeep (née Camlistore) is your personal storage system for life: a way of storing, syncing, sharing, modelling and backing up content.
https://perkeep.org/
Apache License 2.0
6.49k stars 447 forks source link

importer: add Instagram importer #1126

Open bradfitz opened 6 years ago

bradfitz commented 6 years ago

Tracking bug for instagram importer.

plural commented 6 years ago

is anyone already working on the instagram importer? I'd like to help if i can.

bradfitz commented 6 years ago

@plural, nope. Please do!

Note that using the new-ish pk-devimport makes importer development a lot more fun & iterative. (faster hack/save/compile/run cycle)

plural commented 6 years ago

fantastic. I've written some stuff against the instagram API in python a while back. The instagram API docs talk about non-automated uses, so i am a bit worried we will need to prevent it from running scheduled, but i'll dig in more and see what's up.

robertgzr commented 6 years ago

@plural not sure how far you got but it seems like instagram has seriously limited their public API https://www.instagram.com/developer/changelog and they plan on deprecating reading the authenticated users profile and media by 2020 :/

robertgzr commented 6 years ago

@bradfitz @mpl I have a working instagram importer on top of goinsta but I'm a bit stuck on how to test it... I would just submit what I have for review? :p

I also haven't really looked into the ui part too much...

mpl commented 6 years ago

@robertgzr sure, if it seems to be working, that's a good enough starting point for a CL. Send away please.

For an integration test, you can look at what's already been done in other importers, such an the pinboard one.

mpl commented 6 years ago

@robertgzr I've started reviewing, but before I go further, can you please explain better why we need to use goinsta at all? In particular, it bothers me that we would have to ask the user's credentials (username and password) while the API seems to be offering good old OAuth2: https://www.instagram.com/developer/authentication/

Feel free to point at parts of the code directly in the CL (even the vendored parts) to explain better.

robertgzr commented 6 years ago

@mpl as I commented here the instagram api is being folded into facebook's graph api (that is restricted to business accounts only) and if you have a look at the endpoint documentation here all we have left to work with is an endpoint for getting "recent" posts. I tried using that at first but found that I could only get 20 entries, which is because of instagram's sandbox mode where you will have to submit the app for review to be approved further access.

All this together with the fact that even the most basic access to the users own profile and media is scheduled for getting disabled by early 2020 made me choose goinsta instead.

Now regarding the issue with user credentials... I left a comment in the ServeCallback function about using their session config import/export functionality once it's implemented like we would want it. Then we don't need to keep the password from what I can see...

mpl commented 6 years ago

@robertgzr yes, I had a look at the endpoints, and I feared the 20 entries limitation, but good to know you've actually tried.

But then comes the next question (because I'm hoping you've had a look at the goinsta code, I haven't yet): the goinsta devs do not have anymore access to a broader instagram API than we do, so how do they do it? And is the way they do it complicated enough that we have to import all of goinsta instead of redoing it ourselves for our (I assume) simpler needs?

robertgzr commented 6 years ago

@mpl They reverse-engineered the "non-public" instagram api (what the app is using I assume), probably similar to https://github.com/mgp25/Instagram-API (which has quite elaborate technical documentation) by basically emulating the app's behavior.

I wouldn't want to replicate all that stuff... but I see what you mean. The alternative would be to have this be it's own app instead of an integrated importer.

mpl commented 6 years ago

so they're basically using undocumented endpoints?

mpl commented 6 years ago

https://perkeep-review.googlesource.com/c/perkeep/+/17606