lancachenet / monolithic

A monolithic lancache service capable of caching all CDNs in a single instance
https://hub.docker.com/r/lancachenet/monolithic
Other
725 stars 73 forks source link

Added scripts to handle chunk processing #176

Closed VibroAxe closed 6 months ago

VibroAxe commented 11 months ago

Scripts added to /scripts to allow chunk processing

findchunk.sh docker exec <container> /scripts/findchunk.sh <chunk> Returns internal paths of files related to that chunk deletechunk.sh

deletechunk.sh docker exec <container> /scripts/deletechunk.sh <chunk> Deletes files related to that chunk

Closes #170

Lepidopterist commented 11 months ago

awk expects the parameter to be escaped and wrapped in \s.

If a user passes in:

/a/b/c/009182347098798243 as the filename, awk needs this to be prepared as:

/a\/b\/c\/009182347098798243/

Note the start/end of the parameter are both / , and the remaining / have been escaped to \/

awk MAY accept a double-escaped preceding \ in the filename eg:

/\/a\/b\/c\/009182347098798243/

which would simplify processing it, but this needs to be tested.

VibroAxe commented 11 months ago

I feel like there should be a way to handle this escaping automatically, currently it's eluding me

sfinke0 commented 11 months ago

Hi @VibroAxe,

we maybe found a way to reverse engineer the path to the cached file given you know about the cache_identifier and the URL path of the requested file. This prevents high load on the filesystem when you have a huge amount of cached stuff.

Let me try to explain: Having blizzard as the cache identifier and /tpr/catalogs/data/0d/66/0d662405275758d1adb94042625d1b13 as the requested file we can determine the actual path of the cached file in the nginx cache directory:

root@fw-iub:~# ./find_cached_file.sh  blizzard /tpr/catalogs/data/0d/66/0d662405275758d1adb94042625d1b13
# gives the following file in the nginx cache folder
14/d9/226439623e0900f43cc457d6a6a6d914

# check if this is the correct file
root@fw-iub:~# head -n 2 14/d9/226439623e0900f43cc457d6a6a6d914
e��d�/de�d�$�p��""0d662405275758d1adb94042625d1b13"
KEY: blizzard/tpr/catalogs/data/0d/66/0d662405275758d1adb94042625d1b13bytes=0-1048575

# looks good, output of the script could be used in another script / bash one-liner / ...

The little script find_cached_file.sh itself:

#!/usr/bin/env bash

CACHE_IDENTIFIER=$1
URL_PATH=$2

# converting 1m to its byte equivalent
CACHE_SLICE_SIZE_BYTES=$(echo ${CACHE_SLICE_SIZE} | tr '[:lower:]' '[:upper:]' | numfmt --from-unit=1 --from=iec)
BYTE_RANGE_END=$((CACHE_SLICE_SIZE_BYTES - 1))
BYTE_RANGE="0-${BYTE_RANGE_END}"

PROXY_CACHE_KEY="${CACHE_IDENTIFIER}${URL_PATH}bytes=${BYTE_RANGE}"
PROXY_CACHE_KEY_MD5=$(printf $PROXY_CACHE_KEY | md5sum | awk '{print $1;}')

L1_FOLDER=$(echo $PROXY_CACHE_KEY_MD5 | cut -c31-32)
L2_FOLDER=$(echo $PROXY_CACHE_KEY_MD5 | cut -c29-30)

echo "${L1_FOLDER}/${L2_FOLDER}/${PROXY_CACHE_KEY_MD5}"

Maybe you can check yourself if you have success finding the correct files. Please let me know.

Thanks Sebastian

VibroAxe commented 11 months ago

@sfinke0 thanks for the input, the reason we haven't done forward pass lookups in the past is you can't guarantee what the byte range request is. Consider two other requests for the above example, one where the fine is smaller than 1mb and once where it is a 45gb file (I'm looking at you origin). The average user doesn't know which chunk of a file is broken, only that the whole file is broken, in which case clearing all reference is more complete. We could perhaps provide a quick delete which assumes a 1mb chunk, but I'm not sure how best to cover this

@Lepidopterist I guess if we knew the file, can we pull range information from the access.log to find all queried ranges? (Feels like we are going to hit a sub request issue somewhere here)

stale[bot] commented 8 months ago

This issue has been automatically marked as inactive because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 6 months ago

This issue has been automatically closed after being inactive for 30 days. If you require further assistance please reopen the issue with more details or talk to us on discord