Open pr0fsmith opened 7 months ago
+1 I spent a lot of time trying to do it, without success.
I'm very close to finishing a bash script to accomplish this. I'm editing a script that I found on reddit.
Apr 24, 2024 12:01:59 p.m. Caelzero @.***>:
+1 I spent a lot of time trying to do it, without success.
— Reply to this email directly, view it on GitHub[https://github.com/mb1986/rm-hacks/issues/330#issuecomment-2075298243], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AM5FBGGGXYPL6G6CMXLAGSLY67JPPAVCNFSM6AAAAABGVZNJ7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZVGI4TQMRUGM]. You are receiving this because you authored the thread. [Tracking image][https://github.com/notifications/beacon/AM5FBGA3AAXRGROXLGXAGG3Y67JPPA5CNFSM6AAAAABGVZNJ7SWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTT3WKE4G.gif]
@pr0fsmith you may want to edit your first post to wrap the code with ``` so that it's properly formatted as code.
Done
Apr 24, 2024 2:30:27 p.m. Nathaniel van Diepen @.***>:
@pr0fsmith[https://github.com/pr0fsmith] you may want to edit your first post to wrap the code with ``` so that it's properly formatted as code.
— Reply to this email directly, view it on GitHub[https://github.com/mb1986/rm-hacks/issues/330#issuecomment-2075577871], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AM5FBGEB5UN65UJTH4K3RRTY6724HAVCNFSM6AAAAABGVZNJ7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZVGU3TOOBXGE]. You are receiving this because you were mentioned. [Tracking image][https://github.com/notifications/beacon/AM5FBGAQ7Z75RZIJE75MZUDY6724HA5CNFSM6AAAAABGVZNJ7SWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTT3W3HA6.gif]
Here's my finished code:
#!/bin/bash
# getPDFHighlights.sh
# fetches Remarkable PDF text highlights, stores all data in the directory it is called from.
# 2022-09-18, NickK: some clean ups
# 2022-09-17, NickK: initial version
# Dear gods, this is one horrible, horrible result of bored saturday night
# hacking, more to prove a point and to keep me away from rage-buying another
# device that would allow me not only to do PDF text highlights, but also to extract them.
# use at your own risk, it might fry your flux compensators.
# Works under Windows WSL Ubuntu, device software 2.14. Let me know about your setup.
# dependencies:
# jq
# sshpass
# scp
# ssh
# ls
# find
# cat
# rsync
# reminder: sudo apt-get install <dependency>
# ---------------------------------------------------------------------
# edit this section for your device
USERNAME="root"
HOST="enter ip of your rM here"
PASS="enter the password of your rM here"
PATH_XOCHITL='/home/root/.local/share/remarkable/xochitl/' #you may or may not need to change the directory path here
# ---------------------------------------------------------------------
# nothing to do below here
SCRIPT="
cd ${PATH_XOCHITL}
ls -1d *.highlights"
# get a list of all existing highlight directories (i.e. for all documents that contain any highlights)
echo "logging in to device in order to fetch a list of all highlighted files (PDF and non-PDF alike)"
filesHighlighted=$(sshpass -p ${PASS} ssh -l ${USERNAME} ${HOST} "${SCRIPT}")
# narrow down to highlight directories for PDF files only and fetch their .content file, containing meta data
myHighlights=($filesHighlighted)
for (( i=0; i<${#myHighlights[@]}; i++ )); do
bareName=${myHighlights[$i]%.highlights}
contentName="${bareName}.content"
echo " ${contentName}"
sshpass -p ${PASS} ssh -l ${USERNAME} ${HOST} "grep -q \"pdf\" ${PATH_XOCHITL}${contentName} && cat ${PATH_XOCHITL}${contentName} " > $contentName
done
echo -e "\nremoving non pdf-associated .content files\n"
# they are all size 0
find . -size 0 -delete -type f
# the last couple of steps could really need some refactoring love
# walk through local pdf content files
filesPdfHighlighted=$(ls -1 *.content)
myPdfHighlights=($filesPdfHighlighted)
# fetch all directories for PDFs highlighted
echo -e "copying PDF highlight directories and metadata files\n"
for (( i=0; i<${#myPdfHighlights[@]}; i++ )); do
bareName=${myPdfHighlights[$i]%.content}
sshpass -p ${PASS} rsync -a root@${HOST}:${PATH_XOCHITL}${bareName}.highlights . --info=progress2
sshpass -p ${PASS} rsync root@${HOST}:${PATH_XOCHITL}${bareName}.metadata . --info=progress2
done
echo -e "walking through meta data for PDF files, getting highlight data files for individual pages\n"
let pdfFileCount=0;
for (( i=0; i<${#myPdfHighlights[@]}; i++ )); do
let pdfFileCount++
echo "File: ${myPdfHighlights[$i]}"
# Check the structure of the .content file
if jq -e '.cPages.pages[].id' "${myPdfHighlights[$i]}" >/dev/null 2>&1; then
# If the structure matches the nested page ID, extract page IDs accordingly
pages=$(cat ${myPdfHighlights[$i]}| jq .cPages.pages[].id -r)
else
# Otherwise, assume the existing structure and extract page IDs directly
pages=$(cat ${myPdfHighlights[$i]}| jq .pages[] -r)
fi
myPages=($pages)
# Walk through the pages
let pageCount=0
for (( j=0; j<${#myPages[@]}; j++ )); do
# Check if there is a highlight (JSON file) for a given page ID
if test -f "${myPdfHighlights[i]%.content}.highlights/${myPages[$j]}.json"; then
let pageCount++
# Extract visibleName from metadata file
visibleName=$(jq -r '.visibleName' "${myPdfHighlights[i]%.content}.metadata")
# Create the 'md' directory if it does not exist
mkdir -p md
# Extract highlighted content from JSON files to Markdown files
# Naming the latter to the original document's page number
# - First page in Remarkable is 0, so we +1 to have the right offset
# - Left-pad page number with 0. 1 becomes 001. 53 becomes 053.
k=$(printf "%03d" $((j + 1)))
echo " Extracting page ${k}"
# Using jq to extract text, sorting for highlight position instead of keeping the default order.
# Remarkable's devices store highlights in the order you actually did them.
echo -e "\n [[${visibleName}]] Highlights on page ${k}" >> "md/${visibleName}.md"
# ATT: You can highlight the same text more than once, and consequently it will be extracted more than once
jq -r '.highlights[] | .[].text' "${myPdfHighlights[i]%.content}.highlights/${myPages[$j]}.json" | while IFS= read -r line; do
echo -n "$line " >> "md/${visibleName}.md"
done
# Add an empty line to separate contents of different .json files
# Also, append the page number and the name of the pdf file wrapped with 2 square brackets on each side
echo "" >> "md/${visibleName}.md"
# Delete local highlight JSON
rm "${myPdfHighlights[i]%.content}.highlights/${myPages[$j]}.json"
fi
done
done
echo -e "There were ${pdfFileCount} PDF file(s) with highlights.\n"
Updated the script again.
#!/bin/bash
# getPDFHighlights.sh
# fetches Remarkable PDF text highlights, stores all data in the directory it is called from.
# 2022-09-18, NickK: some clean ups
# 2022-09-17, NickK: initial version
# Dear gods, this is one horrible, horrible result of bored saturday night
# hacking, more to prove a point and to keep me away from rage-buying another
# device that would allow me not only to do PDF text highlights, but also to extract them.
# use at your own risk, it might fry your flux compensators.
# Works under Windows WSL Ubuntu, device software 2.14. Let me know about your setup.
# dependencies:
# jq
# rsync
# ssh
# ls
# find
# cat
# reminder: sudo apt-get install <dependency>
# It is recommended you setup an SSH Key https://remarkable.guide/guide/access/ssh.html#setting-up-a-ssh-key
# nothing to do below here
echo "Enter your tablet's user name. This is usually 'root'"
read USERNAME
echo "Enter your tablet's ip address"
read HOST
PATH_XOCHITL='/home/root/.local/share/remarkable/xochitl/'
# Function to fetch visible names from metadata files
get_visible_name() {
local filename="$1"
local metadata_file="${filename%.highlights}.metadata"
ssh -l ${USERNAME} ${HOST} "cat ${PATH_XOCHITL}$metadata_file" | jq -r '.visibleName'
}
#Function to convert highlight data to mardkdown file
highlights_to_markdown() {
echo -e "walking through meta data for PDF files, getting highlight data files for individual pages\n"
let pdfFileCount=0;
for (( i=0; i<${#myHighlights[@]}; i++ )); do
let pdfFileCount++
echo "File: ${myHighlights[$i]}"
# Check the structure of the .content file
if jq -e '.cPages.pages[].id' "${myHighlights[$i]}" >/dev/null 2>&1; then
# If the structure matches the nested page ID, extract page IDs accordingly
pages=$(cat ${myHighlights[$i]}| jq .cPages.pages[].id -r)
else
# Otherwise, assume the existing structure and extract page IDs directly
pages=$(cat ${myHighlights[$i]}| jq .pages[] -r)
fi
myPages=($pages)
# Walk through the pages
let pageCount=0
for (( j=0; j<${#myPages[@]}; j++ )); do
# Check if there is a highlight (JSON file) for a given page ID
if test -f "${myHighlights[i]%.content}.highlights/${myPages[$j]}.json"; then
let pageCount++
# Extract visibleName from metadata file
visibleName=$(jq -r '.visibleName' "${myHighlights[i]%.content}.metadata")
# Extract highlighted content from JSON files to Markdown files
# Naming the latter to the original document's page number
# - First page in Remarkable is 0, so we +1 to have the right offset
# - Left-pad page number with 0. 1 becomes 001. 53 becomes 053.
k=$(printf "%03d" $((j + 1)))
echo " Extracting page ${k}"
# Using jq to extract text, sorting for highlight position instead of keeping the default order.
# Remarkable's devices store highlights in the order you actually did them.
echo -e "\n [[${visibleName}]] Highlights on page ${k}" >> "${visibleName}.md"
# ATT: You can highlight the same text more than once, and consequently it will be extracted more than once
jq -r '.highlights[] | .[].text' "${myHighlights[i]%.content}.highlights/${myPages[$j]}.json" | while IFS= read -r line; do
echo -n "$line " >> "${visibleName}.md"
done
# Add an empty line to separate contents of different .json files
# Also, append the page number and the name of the pdf file wrapped with 2 square brackets on each side
echo "" >> "${visibleName}.md"
# Delete local highlight JSON
rm "${myHighlights[i]%.content}.highlights/${myPages[$j]}.json"
fi
done
done
}
# Fetch list of highlighted files
echo "Choose an option:"
i=1
filesHighlighted=$(ssh -l ${USERNAME} ${HOST} "cd ${PATH_XOCHITL} && ls -1d *.highlights")
for option in $filesHighlighted; do
visible_name=$(get_visible_name "$option")
echo "$i. $option ($visible_name)"
((i++))
done
# Prompt user for selection
read -p "Enter the number of the option you want to choose (or 'a' to choose all): " choice_num
# Check if the choice is 'a'
if [ "$choice_num" = "a" ]; then
echo "You have chosen to copy all files."
rsync -a -v root@${HOST}:${PATH_XOCHITL}*.metadata root@${HOST}:${PATH_XOCHITL}*.content root@${HOST}:${PATH_XOCHITL}*.highlights . --progress
filesHighlighted=$(ls -1 *.content)
myHighlights=($filesHighlighted)
highlights_to_markdown
echo "Deleting source files from computer leaving only the markdown file(s)"
rm -R *.metadata *.content *.highlights
exit 0
fi
# Validate user input
if ! [[ "$choice_num" =~ ^[0-9]+$ ]]; then
echo "Error: Please enter a valid number or 'a' for all."
exit 1
fi
# Check if the selected number is within the valid range
if [ "$choice_num" -ge 1 ] && [ "$choice_num" -le "$i" ]; then
# Get the corresponding option
chosen_option=$(echo "$filesHighlighted" | sed -n "${choice_num}p")
echo "You have chosen: $visible_name"
# Assign the name of the chosen .highlights file to chosenfile without the extension
chosenfile=$(basename "$chosen_option" .highlights)
echo "Copying files"
rsync -a -v root@${HOST}:${PATH_XOCHITL}"$chosenfile".metadata root@${HOST}:${PATH_XOCHITL}"$chosenfile".highlights root@${HOST}:${PATH_XOCHITL}"$chosenfile".content . --progress
filesHighlighted=$(ls -1 *.content)
myHighlights=($filesHighlighted)
highlights_to_markdown
echo "Deleting source files from computer leaving only the markdown file(s)"
rm -R "$chosenfile"*
else
echo "Error: Please enter a number within the valid range or 'a' for all."
fi
Pretty cool script, but if it's ran remotely (from a computer), should this really be within rm-hacks? If yes, how do you see this being implemented, some button that extracts the highlights in the current document?
Pretty cool script, but if it's ran remotely (from a computer), should this really be within rm-hacks? If yes, how do you see this being implemented, some button that extracts the highlights in the current document?
I think there's benefit to it being an rm-hack for less advanced users. An implementation of it, I imagine, on the rM itself would be a button that extracts the highlights and emails it to you or creates a page at the end of the document with the highligts on it. In either case, the user would have the means to import that highlights into a program of their choosing.
Please see remarks
I was using it a lot when on 2.x. I pushed some updated with the way remarkable changed things. The problem is that it is always a moving target, since rM changes how they do highlights quite a bit.
It isn't updated to 3.x yet (see here).
Here is something that parses the new format: parser
Hi, I was looking for this and I was indeed thinking about an integrated button, so that the highlighted lines would be exported into a new note book (page), each highlighted section would be annotated with with the papers name and page number, and with expandable space to write more annotations on. Would be incredibly helpful for academic work. Cheers!
That would be incredible although I think it would be difficult. The only way I see that happening is if we can convert the highlights into typed text in the notebook page. You're idea would indeed be very helpful.
May 16, 2024 8:48:58 a.m. free-da @.***>:
Hi, I was looking for this and I was indeed thinking about an integrated button, so that the highlighted lines would be exported into a new note book (page), each highlighted section would be annotated with with the papers name and page number, and with expandable space to write more annotations on. Would be incredibly helpful for academic work. Cheers!
— Reply to this email directly, view it on GitHub[https://github.com/mb1986/rm-hacks/issues/330#issuecomment-2115163345], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AM5FBGE6UKY6PUNOLT5YTK3ZCSTLVAVCNFSM6AAAAABGVZNJ7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJVGE3DGMZUGU]. You are receiving this because you were mentioned. [Tracking image][https://github.com/notifications/beacon/AM5FBGCHNIPFFA3GBIFMUV3ZCSTLVA5CNFSM6AAAAABGVZNJ7SWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTT6CLKNC.gif]
Hello! Im actually trying to run your script but apparently no .highlights file in my device are found. Is this available for >3 versions?
Hello! Im actually trying to run your script but apparently no .highlights file in my device are found. Is this available for >3 versions?
Double check that the path in the script matches your device. PATH_XOCHITL value may need to be changed.
Hello! Im actually trying to run your script but apparently no .highlights file in my device are found. Is this available for >3 versions?
Double check that the path in the script matches your device. PATH_XOCHITL value may need to be changed.
Thanks for your answer!
Actually the path is correct.
This is the error:
ls: *.highlights: No such file or directory
No .highlights file in my folder. Only this files.
ecf7a0cd-753f-4505-b2ae-3b5534b6bbeb ecf7a0cd-753f-4505-b2ae-3b5534b6bbeb.content ecf7a0cd-753f-4505-b2ae-3b5534b6bbeb.metadata ecf7a0cd-753f-4505-b2ae-3b5534b6bbeb.pagedata ecf7a0cd-753f-4505-b2ae-3b5534b6bbeb.pdf ecf7a0cd-753f-4505-b2ae-3b5534b6bbeb.thumbnails
I would recommend using find . -maxdepth 1 -name '*.highlights' -type d
instead of ls -1d *.highlights
. This will not fail if there are no highlights folders.
I would recommend using
find . -maxdepth 1 -name '*.highlights' -type d
instead ofls -1d *.highlights
. This will not fail if there are no highlights folders.
Still facing issues. Export attached with original code. logs.txt
.highlights are missing on my folder. Searched on the whole deviced but nothing found. I'm actually running Version 3.11.2.5.
I would recommend using
find . -maxdepth 1 -name '*.highlights' -type d
instead ofls -1d *.highlights
. This will not fail if there are no highlights folders.Still facing issues. Export attached with original code. logs.txt
.highlights are missing on my folder. Searched on the whole deviced but nothing found. I'm actually running Version 3.11.2.5.
I'm running 3.9 right now. I wonder if rM changed the way they save highlights. I don't think I'm going to be able to help since I'm not running the same software version.
I would recommend using
find . -maxdepth 1 -name '*.highlights' -type d
instead ofls -1d *.highlights
. This will not fail if there are no highlights folders.Still facing issues. Export attached with original code. logs.txt .highlights are missing on my folder. Searched on the whole deviced but nothing found. I'm actually running Version 3.11.2.5.
I'm running 3.9 right now. I wonder if rM changed the way they save highlights. I don't think I'm going to be able to help since I'm not running the same software version.
Oh. Yeah, there might be something different. Will try to upgrade to your version or just wait for an oficial solution. thanks a lot
I've tried the script and it doesn't seem to work. Trying on macOS.
It always creates several *content files and also several folders named the same but that's it, where could be the error?
Edit 1: FW 3.11.2.5
Edit 2: OK, I searched the folder in reMarkable and in the folder: '/home/root/.local/share/remarkable/xochitl/'
the necessary files are located: (. highlights, . epubindex, . epub, . content)
But when the script finishes my work, only the .content file is downloaded, but no pdf with highlighted texts.
It may only work on version 3.9 and below. I'm running 3.9 right now.
Jun 23, 2024 3:19:06 a.m. Wajsar Josef @.***>:
I've tried the script and it doesn't seem to work. Trying on macOS.
It always creates several *content files and also several folders named the same but that's it, where could be the error?
— Reply to this email directly, view it on GitHub[https://github.com/mb1986/rm-hacks/issues/330#issuecomment-2184758813], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AM5FBGGOIEBK4YJCWEAFD6TZIZZGVAVCNFSM6AAAAABGVZNJ7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBUG42TQOBRGM]. You are receiving this because you were mentioned. [Tracking image][https://github.com/notifications/beacon/AM5FBGEGDNDD5UHMXMVFXC3ZIZZGVA5CNFSM6AAAAABGVZNJ7SWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUCHDDB2.gif]
It would be great to be able to extract highlights along with document title and page number to a text file that I can then copy to my computer. Found the below script if it helps. I just can't figure out how to get the page number.