jsonMartin / readwise-mirror

MIT License
53 stars 12 forks source link

feat: :sparkles: migrate to `export` API endpoint and integrate Reader features #41

Open johannrichard opened 8 months ago

johannrichard commented 8 months ago

This change moves the API from the books and highlights endpoints to the newish export endpoint which is made for that kind of solution. It also implements the new document_note and summary fields introduced with Reader. This should also implement #39.

Due to the differences in information returned from the API endpoint, some information, namely the total count of books, is missing. On the other hand, I added some additional status updates about processing (# and %) as feedback for processing of larger libraries.

The changes to the API are minimal, I didn't change books in the Library interface. Since the API returns items like user_book_id for all content types, keeping books seemed appropriate to me, but that could be changed if deemed necessary.

Not sure if its best to integrate this PR first before moving on with #40.

jsonMartin commented 8 months ago

Hey @johannrichard , just had a chance to review. Appreciate you keeping the interface changes minimal while implementing the new API endpoint. Other than a couple of small style suggestions for consistency, LGTM! 👍🏼

I'm also guessing that the new endpoint with the nextPageCursor implementation might fix that intermittent rate limiting issue that was popping up a while back?

And I agree, I think we should merge this first before #40, since that would be a breaking change.

johannrichard commented 7 months ago

@jsonMartin thanks for the review. Will have a look at the comments and come back to you.

I'm also guessing that the new endpoint with the nextPageCursor implementation might fix that intermittent rate limiting issue that was popping up a while back?

I have been running the changes quite heavily over the past days and so far have not encountered any rate limitings. So at the very least it's not getting worse.

Also, I have used that export API in another project of mine (private) for quite some time and there never had any rate limit issues.

johannrichard commented 7 months ago

@jsonMartin either I'd missed that completely during testing or there was a change in the export API (but most probably the former 🙈 ): it appears the endpoint really only returns the changed highlights in the call when sending along a last updated date. Which of course doesn't work with the way the plugin is built. Apologies for that!

Can you wait with integrating this PR for the moment?

johannrichard commented 7 months ago

As I see it right now, there are two ways forward:

Without prejudice to the decision, here's the status for the second option:

I've pushed 37e03ef which implements a local cache as one (of at least two) ways to deal with the fact that export only delivers the delta with updatedAfter as an argument, which could then be used to call export again with the list of books that changed to retrieve all highlights.

The second option is a local cache that would be used to create the current picture of an article and its highlight by merging the information from the update into the cache. This introduces new code which might fail in new ways. Yet it could also open the door for things like three-way comparisons of the notes in the Vault since we have original, local change and remote change available to merge. As you can guess, being able to change notes without losing them dring the next update is something I think would really be worthwhile.

There are few caveats related to the local cache approach though:

Happy to hear your thoughts and also happy to abandon this in case you think this is taking a bad turn.

PS: the other recent PRs are independent from this one and are based on the current master.

[^1]: Deleted highlights are in any case treated special in the API and are not considered a change. So unless another highlight is added / changed, no change is triggered (neither with the books and highlights nor the export API. I thought about asking Readwise if there are plans to implement a full changelog in the export API, i.e. to also include deleted highlights, but I doubt so and think this is by design (or due to the way the data is stored).

johannrichard commented 7 months ago

@jsonMartin Now that I have sorted myself and the reading of the Export API documentation properly out (apologies again for that extra round 🙈), I am happy to have you review the update which implements the existing functionality with the export API. This also adds a few new fields like summary and note, as well as the unique_url which points to Reader in case that's the source of the highlights.

Should all be documented in the Readme, too.

jsonMartin commented 7 months ago

@jsonMartin either I'd missed that completely during testing or there was a change in the export API (but most probably the former 🙈 ): it appears the endpoint really only returns the changed highlights in the call when sending along a last updated date. Which of course doesn't work with the way the plugin is built. Apologies for that!

Can you wait with integrating this PR for the moment?

Hey @johannrichard, appreciate your continued work on this!

Yeah, your above findings are why I originally built the plugin with the books/highlights endpoint, because the export API only synced incremental changes. I didn't test this new branch yet with my library, so was just assuming there was some sort of API change (after the release of their new Reader software) to allow for this now; it's been a while since I've done a deep dive into the Readwise API docs.

The main issue I remember with the way the Export API syncs vs books/highlights is that if users edit or delete notes, those aren't reflected. Also, if I remember correctly the chronology gets messed up - if later a highlight is added, let's say in the second chapter, it is appended to the end of the document rather than being sorted by location. Were you able to figure out how to provide this functionality with the exports endpoint?

I'm traveling right now so won't have a chance to look at this for a few days, but I'll try to review as soon as I can. Thanks!

johannrichard commented 7 months ago

The main issue I remember with the way the Export API syncs vs books/highlights is that if users edit or delete notes, those aren't reflected. Also, if I remember correctly the chronology gets messed up - if later a highlight is added, let's say in the second chapter, it is appended to the end of the document rather than being sorted by location. Were you able to figure out how to provide this functionality with the exports endpoint?

Thanks for sharing these points – now I have something to look for in detail. I'll do some more tests here and will try to figure out what I get from the API. What I can confirm: using export with ids (as you did with book / highlight) will return the complete current set of highlights.

Safe travels!