javacafe01 / PdfViewer

A simple Pdf document viewer 💼
MIT License
385 stars 65 forks source link

Remember the pages you were on in a PDF even if the app is closed. #133

Open derekelkins opened 3 years ago

derekelkins commented 3 years ago

This change persists the page in a PDF to a database and then restores it when reopening a PDF. In other words, even if you completely close the app/restart your phone/etc. it will still remember where you were in a PDF. It remembers forever. It would probably take on the order of opening 10,000 distinct PDFs to have the database grow to 1 megabyte.

Some implementation details:

I didn't want to pull in something like RxJava and RxAndroid just for this, so I do the asynchronous handling using java.util.concurrent directly as the deprecation message for AsyncTask suggests. It sounds like you are going to rewrite to Kotlin, so then using Kotlin's coroutines would be the recommended pattern.

I didn't do much testing on older versions. I ran it in an emulator using API Level 27 and my phone is API Level 29 or 30. I think Android Room will work a decent ways back.

Writing the page to the DB every time it changes seems fine, performance-wise, in practice for me.

derekelkins commented 3 years ago

Using the Uri as the "identifier" of for the PDF is less than ideal as two different ways of accessing the same file will lead to different Uris. It would be easy to instead use the PDF "title" but that isn't unique. Arguably the best thing to use would be a hash (e.g. MD5) but that's relatively expensive to compute.

Fs00 commented 3 years ago

Thanks @derekelkins, interesting work! I appreciate that you tried to keep it as minimal as possible, although I find that using a database (and Room) just to store simple key-value pairs is still a bit overkill.

What about using a separate SharedPreferences file for saved locations? It would also have the advantage that the concurrency would be managed internally by SharedPreferences instead of manually by us. Atm I can't remember if you can define an arbitrary path for a SharedPreferences file. If you could, it would be nice to store locations in cache folder, so that the user can reset them without losing app settings.

Fs00 commented 3 years ago

Using the Uri as the "identifier" of for the PDF is less than ideal

I agree with you, currently this feature wouldn't work if the user opens the same document via different apps. As you said, the only meaningful unique identifier I can think of is a hash of the document. Maybe the hash could be computed in a background thread and then start saving the location when the computation is done? MD5 hashes should be very fast to compute for small files anyway.

derekelkins commented 3 years ago

I considered just using the preferences system, but this seems like an abuse of it and usage patterns (e.g. relatively heavy writes) seem like exactly the thing the Android docs warn against. That said, it may still work fine in practice, possibly with the addition of a (fairly generous, say 1,000 entry) limit on the number of saved locations and/or throttling of the writes.

For MD5 hashing, I agree that computing the hash is probably fast enough to not be noticeable for most files. Unfortunately, I have several PDFs which are over 100MB, and, of course, larger PDFs (page count-wise at least) are the ones where this feature most comes in handy. If the hash is computed in the background, this is much less of an issue, though it might still be a bit problematic for PDFs from the internet. A mapping from Uri to hash could be maintained so we don't need to recalculate the hash every time we open a file. The current mapping would then be from hash to saved location rather than from Uri to saved location. (It may makes sense to store a date last viewed as well and to pop up a toast if when opening a Uri that hashes to a PDF that hasn't been viewed in, say, more than a week. It might be surprising to open a seemingly new file and get scrolled halfway through it because you've forgotten that you've read it before. I fairly often open papers that I last looked at well over a year ago.)

I may play around with doing the hashing.

Fs00 commented 3 years ago

If the hash is computed in the background

I thought more about this and probably is not a very good idea UX-wise when you have very big files. If the hash computation took for example 5 secs, the user would start reading/scrolling through the document and then suddenly the viewer would jump to the previously saved location. Caching can mitigate this problem but it wouldn't solve it completely. Probably it's better to avoid introducing additional complexity (especially if it involves concurrency) and just use the URI.

Fs00 commented 3 years ago

It may makes sense to store a date last viewed as well and to pop up a toast if when opening a Uri that hasn't been viewed in, say, more than a week. It might be surprising to open a seemingly new file and get scrolled halfway through it because you've forgotten that you've read it before. I fairly often open papers that I last looked at well over a year ago.

This seems like a good idea! In those cases you could prompt the user to go to the previously saved location using a snackbar (I don't know if that's what you meant with "popup a toast").

Also, to prevent storing useless entries in the database you could avoid saving the location (and deleting the previously saved location if present) when the current page is 1.

derekelkins commented 3 years ago

I updated the pull request to identify the PDF based on a hash, however, it computes the hash synchronously so it can find the saved page before showing the PDF. To avoid this causing a large delay (which was on the order of 7 seconds when I opened a ~200MB file on a real phone), the code only hashes the first megabyte. This should be unique in practice. The delay for this is unnoticeable on my device, but the size could easily be reduced if it's too slow on other devices.

Fs00 commented 3 years ago

Good idea @derekelkins! I have some more lower-level suggestions on your PR:

derekelkins commented 3 years ago

MainActivity is already very big, could you try to extract the logic responsible for hashing and storing/retrieving the saved location in a background thread in a separate class? (a fitting name could be LocationSaver)

The Executor/Handler stuff should go away in a Kotlin rewrite, so that doesn't bother me. Frankly, the whole MainActivity, and probably app, could be improved with a refactor, and I imagine this is one of the motivations behind JavaCafe01's desire to rewrite to Kotlin. I was ambivalent between doing something "clean" and fitting the current style. Usually I will try to stick with the style of the project I'm contributing to unless I think it will be very problematic. In this case, assuming that a rewrite will be happening pretty soon, there's not a lot of benefit in doing it one way or the other.

currently we are saving the location every time the page changes. If I scroll quickly through a document that has hundreds of pages, setCurrentPage callback will be called a lot of times in a very short timespan causing a tremendous amount of concurrent DB writes that may very well lead to race conditions. To avoid this, you could save the location in the onPageScroll handler, which gets run at the end of any scroll event.

I don't think that this triggers less often than the onPageChange event. Maybe in some cases. Certainly a scroll event doesn't need to change the page. Either way, race conditions (due to this at least) should not be an issue because the writes are all being sent to a single thread, so they will happen sequentially. I did do fast scrolls over large PDFs, and it didn't seem to cause any noticeable issue on my device at least.

you can avoid saving the location for documents that have only 1 page, those entries would be just useless :)

I don't think the (admittedly fairly small) code complexity is worth this fairly small savings in disk space. As mentioned before, we're looking at something like 1MB of disk space per 10,000 distinct PDFs opened.

I suggest storing the database inside cache folder (you can pass a path instead of just the DB name to Room.databaseBuilder), so that the user has the ability to clear all saved locations

I made this change, though I personally see the saved locations as more valuable than any of the other stuff you'd lose by clearing data (versus cache). I'm also not sure there will be much of a reason to blow away just the saved locations. But it doesn't really impact me one way or another. I don't know if a typical user would be surprised that clearing cache blew away the saved locations, but I don't think a typical user would do that.

Fs00 commented 3 years ago

Usually I will try to stick with the style of the project I'm contributing to unless I think it will be very problematic. In this case, assuming that a rewrite will be happening pretty soon, there's not a lot of benefit in doing it one way or the other.

This is a mindful consideration, and I'm fine with that :+1:

I don't think the (admittedly fairly small) code complexity is worth this fairly small savings in disk space.

This is really a one-line if condition :rofl:. It's just the fact of saving unnecessary (even if small) data that bothers me (but maybe it's just me)

I don't think that this triggers less often than the onPageChange event. Maybe in some cases. Certainly a scroll event doesn't need to change the page. Either way, race conditions (due to this at least) should not be an issue because the writes are all being sent to a single thread [...]

Didn't notice you were using a single thread executor, so OK for the race conditions. As you rightly pointed out, none of those solution is perfect since both may save the location more often than needed. But what worries me most about saving on page change is the potentially huge amount of DB writes queued in a very short timespan when scrolling fast through many pages (I see this particularly with the scrollbar). It would be just a lot of I/O work that could be easily avoided and that may have an impact on lower-end phones (I might try this on my old crappy phone if I find the time these days). If we saved location on page scroll, scroll events that do not change the page would trigger unneeded writes, but not nearly as many as the case described above.

I made this change, though I personally see the saved locations as more valuable than any of the other stuff you'd lose by clearing data (versus cache).

This is a fair point, mine was just a mere suggestion because I'm not too sure on how much this data could be valuable from a user's standpoint. I personally didn't like much the idea of not being able to clear saved location data without clearing also all app data (particularly settings). Maybe we could just keep it as it is and change that if someone raises a complaint. I would like to know @JavaCafe01's opinion on this too.

Btw, thanks @derekelkins for keeping up! :)