sm13 / rePocket

Another reMarkable Pocket client
GNU General Public License v3.0
0 stars 0 forks source link

Allow retreiving and storing PDFs #4

Closed sm13 closed 1 week ago

sm13 commented 3 weeks ago

PDFs can be saved directly to the device. Add support to recognized them as PDFs and save them without modification.

The Pocket API does not have an easy way to recognize or automatically mark PDFs as PDFs. Here's a response:

{
    "list":
    {
        "3203965016": {
            "domain_metadata": {
                "name": "course.ccs.neu.edu",
            },
            "excerpt": "",
            "favorite": "0",
            "given_title": "vhdl-tutorial.book - vhdl-tutorial.pdf",
            "given_url": "https://course.ccs.neu.edu/cs3650/ssl/TEXT-CD/Content/Tutorials/VHDL/vhdl-tutorial.pdf",
            "has_image": "0",
            "has_video": "0",
            "is_article": "0",
            "is_index": "0",
            "item_id": "3203965016",
            "lang": "",
            "listen_duration_estimate": 0,
            "resolved_id": "3203965016",
            "resolved_title": "https://course.ccs.neu.edu/cs3650/ssl/TEXT-CD/Content/Tutorials/VHDL/vhdl-tutorial.pdf",
            "resolved_url": "https://course.ccs.neu.edu/cs3650/ssl/TEXT-CD/Content/Tutorials/VHDL/vhdl-tutorial.pdf",
            "sort_id": 1,
            "status": "0",
            "tags":  {
                "pdf": {
                    "item_id": "3203965016",
                    "tag": "pdf",
                },
                "vhdl": {
                    "item_id": "3203965016",
                    "tag": "vhdl",
                },
            },
            "time_added": "1612095562",
            "time_favorited": "0",
            "time_read": "0",
            "time_to_read": 0,
            "time_updated": "1612095567",
            "word_count": "0",
        }
    }
}

Perhaps parsing the resolved_url is the only way.

Note! is_article set to 0 tells us that the query needs to change in order to retrieve PDFs.

Update! Actually the reqwest::Response can help! No need to parse the URL, the response header provides the info:

{
    "url": "https://terathon.com/binary_fund.pdf",
    "status": 200,
    "headers": {
        "date": "Sat, 09 Nov 2024 15:27:14 GMT",
        "server": "Apache",
        "upgrade": "h2,h2c",
        "connection": "Upgrade",
        "last-modified": "Fri, 14 Jul 2023 20:55:08 GMT",
        "accept-ranges": "bytes",
        "content-length": "820560",
        "content-type": "application/pdf",
    },
}

Checking for headers: content-type = "application/pdf" should be enough.