postaddictme / instagram-java-scraper

Instagram Java Scraper. Get account information, photos, videos and comments.
449 stars 150 forks source link

Fix getMedias() method #51

Closed raiym closed 6 years ago

raiym commented 7 years ago

Fixed issue with getMedias() but need to fix parsing methods. https://github.com/postaddictme/instagram-java-scraper/commit/283b1b7d29d0f27f4db4f87d6dbc506523e7d8e6

Will fix it in 24 hours

fesad commented 7 years ago

Is this issue due to a modification at instagram side? Will it affect media fields or other access methods?

Mehrshadow commented 7 years ago

is it fixed now?

raiym commented 7 years ago

Not yet

Mehrshadow commented 7 years ago

ok, looking forward to hearing good news from u

igor-suhorukov commented 7 years ago

Hi team! Instagram data format changes fixed in pull request https://github.com/postaddictme/instagram-java-scraper/issues/51 but I think that media model should be changed to reflect fields and structure of graphql .

raiym commented 7 years ago

@igor-suhorukov could you please explaing more? What do you mean under: "reflect fields and structure of graphql"? Why do we need it?

raiym commented 7 years ago

@Mehrshadow it fixed by @igor-suhorukov

Mehrshadow commented 7 years ago

tnx 👍

igor-suhorukov commented 7 years ago

What do you mean under: "reflect fields and structure of graphql"? Why do we need it?

@raiym current data model do not reflect all graphql fields. As example edge_media_preview_like collection in https://www.instagram.com/p/{post}/?__a=1 (preview list of like) also in graphql only single video quality/resolution are present and so on.

Also current data model not useful in terms of data analysis with java 8 stream API (fields getter are absent, so method references not available). Current gson to domain object mapping are boilerplate, error prone and hard to support - as reported by SonarQube metrics and based on my experience.

User to persist data model must copy object to another, but very similar data model. As an example https://github.com/igor-suhorukov/instagram-statistics/tree/master/src/main/java/com.github.suhorukov.instagram.statistics/model

It not so comfortable and looks like boilerplate code in case of similar existing instagram-java-scraper POJO.

I can help project to rewrite model/data mapping layer in more supportable code style and convenience to use case.

raiym commented 7 years ago

@igor-suhorukov thank you for your detailed response.

I support your opinion and I think that model data structure must be as close as possible to official instagram data structure (we plan to support official instagram api within this library as well). And at the same time note that in different queries the same object may be in different json formats and with different field names (look at this parse method https://github.com/postaddictme/instagram-php-scraper/blob/980b1ce6b0c263fe754de03e325fbf17ac0057e7/src/InstagramScraper/Model/Media.php#L359)

Why is gson boilerplate? Do you have other options to consider?

About getters and setters yes, better to add.

If you have time to start refactoring I am happy with it.

I am planning to contribute on weekends

igor-suhorukov commented 7 years ago

@raiym thank you for response.

Why is gson boilerplate? Do you have other options to consider? Because of too many explicit unsafe type conversion in instagram-java-scraper code for each properties.

There are different options to map json to POJO objects in java without boilerplate mapping code:

  1. jsonschema2pojo - json schema to POJO mapping (backed by Jackson 2.x/Jackson 1.x/Gson or Moshi) - pros: very easy to start development; cons: too many "node","nodes", "edges" etc intermediate element/objects reflecting 1:1 all source structure of json.
  2. map json to POJO object by XPath (JaxbUnmarshaller from DOM model) pros: most mature and flexible approach - minimum java code by using mapping described by xpath expressions, pojo objects code are clear and under full control. cons: first time mapping description is difficult to implement
  3. EclipseLink MOXy the same as 2, but new framework for me. I'll investigate this option

About getters and setters yes, better to add.

Lombok help us to get rid boilerplate code for Getters/Setters/ToString/Equals/HashCode etc

I will provide java object model to you for review and feedback

Regards, Igor

igor-suhorukov commented 7 years ago

And at the same time note that in different queries the same object may be in different json formats and with different field names

@raiym we just need to support different mapping for the same object model

  1. jsonschema2pojo copy from temporary typed object model to universal model
  2. by using different xpath for different sources
  3. may be by using bindings.json
igor-suhorukov commented 7 years ago

Hello @raiym

Please review new data model. Based on graphql as close as possible: data model

To match json data format: json dumped by current test pack me.postaddict.instagram.scraper.AnonymousInstaTest

raiym commented 6 years ago

Dear @igor-suhorukov thank you for your hard work.

I will review it and respond in two days (or sooner).

igor-suhorukov commented 6 years ago

more details for you @raiym

graphql Instagram data model rendered as relational DB structure:

raiym commented 6 years ago

@igor-suhorukov thank you for valuable contribution. I have reviewed new data model and I am OK with the most part.

Here the things that I want to point out before the merge:

  1. I think that Java persistence addition is extra (optional) and I don't think that it should be added to the library, because it is up to user how store the data.
  2. When we merge we need to keep me.postaddict.instagram.scraper.domain package untouched until we implement all the methods that current version has.
  3. Next step is implementing conversion methods that will convert instagram raw data in objects. (I can make that part). To do it we need to create new package that will store pojo representations of raw json from Instagram. I agree that we should use this generator for this purpose: https://github.com/joelittlejohn/jsonschema2pojo
  4. Media class misses tagged users (edge_media_to_tagged_user)
  5. Following fields confuses me: private List<Account> likes; private List<Comment> comments; Sincecomments and likes contains only first ~10-20 nodes developers. I think we need to rename it like: lastComments and lastLikes or firstComments and firstLikes. We have special methods to get comments (and likes) (https://github.com/igor-suhorukov/instagram-java-scraper/blob/79527dea0c857834ed37cb1ab4c1f39da7a0889d/src/main/java/me/postaddict/instagram/scraper/Instagram.java#L267)

So in short I am ready to merge your pull request after 1. and 5. will be changed. 2. 3. and 4. we can change after PR merged. We can discuss any point in details if you disagree or have something to say.

igor-suhorukov commented 6 years ago

Hi @raiym thank you for review and feedback!

I will map json to new data model and integrate it into existing solution.

My comments regarding your review:

  1. It just ORM metadata annotation library without any other JPA dependencies. User still have freedom to use any persistence technology: mongodb, elasticsearch, relational database, CouchDB etc. Just mapping metadata, but instagram-java-scraper data model help user to store it with 0 development cost.
  2. OK
  3. I think JSON to POJO library choose should be as a result of development to minimize dev effort and support different mapping for the same object model
  4. DONE
  5. DONE
kovalchuk11 commented 6 years ago

Hi @raiym and @igor-suhorukov! Try Gson from Google. this library has the plus in that it does not require 100% json to the pojo class. Some fields may be missing, and extra ones may be present, this does not prevent you from calling objects. and this will greatly help, since in different queries the same object may be in different json formats and with different names. plus if an extra field suddenly appears, this will not lead to errors.

igor-suhorukov commented 6 years ago

Hi @raiym We in two steps from pull request - code refactoring and tests polishing still needed. Could you please look at https://github.com/igor-suhorukov/instagram-java-scraper and give your feedback.

Json to domain mapping are pluggable and all mapping described as moxy *,json rules src/main/resources/me/postaddict/instagram/scraper/model and ModelMapper class.

igor-suhorukov commented 6 years ago

@kovalchuk11 thank you for information. Gson are already used as mapper in instagram-java-scraper, I hope in future json to model mapping will be pluggable...