stumpapp / stump

A free and open source comics, manga and digital book server with OPDS support (WIP)
https://stumpapp.dev
MIT License
944 stars 44 forks source link

[FEATURE] API for getting dimensions of each page in a book #180

Closed aaronleopold closed 2 months ago

aaronleopold commented 11 months ago

Is your feature request related to a problem? Please describe. No

Describe the solution you'd like

For all image based media, add an API endpoint that returns a list of dimensions for each page in the media file. If this information is not available via metadata, either:

  1. generate it on the fly, update the internal metadata (this kinda goes again REST patterns, unless I do separate POST and GET options for fetching and potentially updating vs just fetching)
  2. generate it on the fly, return results
  3. return an empty list

Not sure what the ideal route is at this point

Additional context

https://discord.com/channels/972593831172272148/972595078554079232/1166115940539322438

JMicheli commented 4 months ago

I'm going to work on this issue, tacking onto the work I've done in experimental for #181.

My plan is to upgrade the media analysis job to additionally check each page's dimensions and record that information in the database. I'll then add a server endpoint to make this information available.

A few initial questions that we should run to ground (I can get started before these are resolved, but they're necessary to finish):

  1. How should the resolution information be stored in the database?
  2. What should the API for this information look like?

For the first question, it could be as simple as storing an array of tuples in the database for each media item on the metadata, but there are also approaches with a smaller storage footprint.

For the second question, I'm thinking a get request at /API/v1/media/:id/page/:page/dimensions and the response will be JSON with the structure { width: integer, height: integer }.

aaronleopold commented 4 months ago

We chatted briefly on Discord about this, but I'll reiterate and continue that discussion here:

  1. How should the resolution information be stored in the database?
  2. What should the API for this information look like?

I think these should be the same, that way there isn't so much a 'translation' step between pulling it out of the DB and sending it as a response. I think an ordered array that just has the structure you suggested for each element, e.g:

[
    Dimension {
        height: 300,
        width: 250,
    },
    // etc
]

More specific to your second question, I think we should support two main API functions for retrieval:

  1. GET for a specific page
  2. GET for all pages

I don't have too strong a preference between having one endpoint vs two, the latter has better type resolution but otherwise feel free to choose either. In case what I mean is unclear:

curl -X GET http://stump.cloud/api/v1/media/:id/page/:page/dimensions # Dimension
curl -X GET http://stump.cloud/api/v1/media/:id/dimensions # Vec<Dimension>

# or

curl -X GET http://stump.cloud/api/v1/media/:id/dimensions # Vec<Dimension>
curl -X GET http://stump.cloud/api/v1/media/:id/dimensions?page=x # Vec<Dimension>

One item I'd like us to consider is where to store these dimensions. In most instances, I feel that the overhead of storing this information in the media metadata is insignificant. I guess the main thing to consider is for large compendium-like books with 2000+ pages, if we store this in metadata we would be (almost always) loading that large dimension information. The workaround would have to be more wide-spread usage of select macros with prisma throughout the media API, only selecting it when explicitly requested. The other option would be to create a small relational table to just dump the information. Let me know if you have any thoughts

aaronleopold commented 2 months ago

Implemented by https://github.com/stumpapp/stump/pull/349