Closed JMicheli closed 3 weeks ago
@JMicheli The compression scheme you outlined is simple and smart! I'm excited to give this a review. I'll try to make time after work sometime this week, but it might land on the weekend. As always, thanks for your time and contributions - I really appreciate it!
I'm working on getting things ready to merge - as is tradition, this part is often the most difficult for me.
No rush on my part! Reach out if you need anything, but I would guess all you need to do is update your feature branch with this repo's experimental branch, either directly if you have a remote set up or just update your fork's experimental and then merge that in
Alright I think that should do it - you were right, just needed a little cargo clean
to get things playing nice again.
I fixed a couple of clippy lints and a missing Err
, but this should be good to go once the experimental finishes building. I'll merge it shortly, thanks again!
This pull request aims to close #180.
I've added a new type of task to the media analysis job created in #307,
AnalyzePageDimensions
. This task attempts to determine the dimensions of each page in a media item and then stores its findings to the database so that they can be accessed through the server's API.The database structure for storing the dimensions data is shown below:
An individual height, width pair is represented in Rust by
PageDimension
.One of the concerns discussed when planning this addition was the possibility that a very long book would have a very long list of dimensions, thus necessitating the separate table to store dimensions, making them optional when loading metadata. But for this same reason, the string encoding a vector of
PageDimensions
could potentially be very large.To avoid database bloat for users with large libraries/long books, I've implemented a simple compression scheme that serializes
Vec<PageDimension>
as follows:I think that this should be effective since many books should have uniform page sizes (excepting the cover and the occasional two-page spread). Thus, many books won't actually need to store that much data in the database. The deserialization function I wrote is copy-free and fast, I think it should not introduce any unwanted overhead.
Finally, the API matches the first option discussed on the issue:
curl -X GET http://stump.cloud/api/v1/media/:id/page/:page/dimensions
gives a single dimension for a single page.curl -X GET http://stump.cloud/api/v1/media/:id/dimensions
gives a list of dimensions for each page (0-indexed).