Open mikez opened 6 months ago
macOS with Python 3.9, 3.10, 3.11, and 3.12 works here and verified folder structure. ๐
Seems to be totally broken for me on Windows. With the new update, I'm only getting an error message: "No data found in the Spotify cache. ...". My client, at least, doesn't seem to be using the root level log or lbd file. I'll investigate a bit more hopefully next week.
My client, at least, doesn't seem to be using the root level log or lbd file. I'll investigate a bit more hopefully next week.
๐
@Nitemice If it's using the old format, we could fall back to that. If it helps, what's the folder structure of your windows_appdata_path
or windows_store_path
?
On macOS, the structure is:
I've looked into it a bit more, and it seems like Spotify no longer uses the custom cache directory for storing the lbd files. It's in the windows_store_path
for me now, and the structure matches what you described.
However, even when I run the script without the custom cache path, it fails. I've tried following what exactly it's looking for, but I don't quite understand the pattern it's trying to match in the file. Either the files on Windows are a bit different, or there's something going wrong with the pattern string. From just looking through the files naively, I've found a few that contains strings that look close to what we seem to be looking for, but they're incomplete compared to what the script is looking for.
Could you provide a bit of clarification on how the pattern string is constructed, if you know? Otherwise, I'm happy to provide some sample files if that would be helpful.
Thanks, @Nitemice!
if I understand you correctly, it sounds like the windows_store_path
structure matches the macOS path structure. In that case, you should be able to run:
spotifyfolders --cache CACHE_DIR
where you replace CACHE_DIR with the windows_store_path.
If that doesn't work, then can you provide any sample files or directory structures you see? That would be most helpful. I'll look into it. โ if you want to search for some pattern, look for "rootlist".
Hey @mikez,
Yes, the structure under windows_store_path
matches what you indicated on macOS, and the script already looks in that location, so I don't need to specify a CACHE_DIR. That part is working fine now.
The problem now is that the SpotifyLevelDB.get()
function is failing to find a log or ldb file that contains a match for the LEVELDB_ROOTLIST_KEY
. Looking though the files, and stepping through manually, it looks like it should find a match, but it fails on the bytestring_less_or_equal
function (L417). I'm not sure why yet.
@Nitemice Ah, I see. The LevelDB files are there, but something seems to failโit can't seem to find the key.
I can think of 2 ways to debug this:
windows_store_path
and I'll take a look.leveldbutil
command. With that one you can dump the contents of levelDB into plaintext and seek the contents of the key that way.As an aside: Spotify uses a greenbase.KeyComparator
to decide the order of keys in the table files. This contents of this function are not public AFAIK, so I made some educated guesses. The assumptions may be wrong here.
I've stepped through the code and I've managed to get it working. However, I had to make two major (probably breaking) changes.
At first, I thought the issue was that the LEVELDB_ROOTLIST_KEY
was being evaluated wrong, and the group separator symbol wasn't being encoded correctly, but it turned out that it's actually the opposite. It seems that there is no group separator in the rootlist key in my levelDB. Instead it's just a space char. So I've changed L24 to:
LEVELDB_ROOTLIST_KEY = b"!pl#slc# spotify:user:{}:rootlist#"
That's odd and all, but after I made that change, I found that still didn't fix it. Instead now the code wasn't even reaching the files that contained the rootlist key. bytestring_less_or_equal
returns True in a bunch of cases where the keys don't match, which caused it to break the loop and just give up searching.
My initial fix was to edit bytestring_less_or_equal
as below, because while I understand this behaviour when the function is used as a comparator, we're not really using it as one here:
if byte1 == group_separator and byte2 != group_separator:
return False
if byte1 != group_separator and byte2 == group_separator:
return False
if byte1 < byte2:
return False
elif byte1 > byte2:
return False
But on further review of the code, a simpler fix is to simply remove the break
on L422.
I have pushed a branch with those changes for you to have a proper look: https://github.com/Nitemice/spotify-folders/commit/63a40c789de273dd7a95997816c8c2d81ccce194
@Nitemice Nice debugging there! :)
Change 1 is very interesting, and I'd like to learn more about thisโis this Windows specific, is there something else going on? I wonder if we could get others to give us data here.
Change 2 makes it so that every key in every table is traversed. This takes longer. However, it could be a fall-back option, in case no key is found.
Here's what I propose in terms of changes (for now):
I'm assuming the greenbase.KeyComparator
may need fixing. What would be really helpful, is if you can check all the keys in your tables, and see how they're ordered; I'm curious if you can find any patterns. For me, it seems to be an alphanumeric regular ordering, except for the group_separators. If you don't know how to output the keys in a ldb file (in their stored order), I can write you a script.
@Nitemice Digging some more, I've noticed the pattern of the keys here seem to be:
PREFIX {SPACER} SUFFIX
where PREFIX can be something like:
!pl#add#
!pl#changes#
!cit#cit#
etc.and the spacers are
$
%
&
\x1d
etc.Does the spacer encode some number, is there some logic behind it? I don't understand. Also, the ordering of the keys becomes more confusing the closer I look at it.
@Nitemice FYI, pushed a new version based on our current understanding.
Change 1 is very interesting, and I'd like to learn more about this
I agree, it'd be good to have more data from others to see if it's the same. If I had to guess, I'd say it's a Windows vs Unix thing, but who knows!
Change 2 makes it so that every key in every table is traversed.
I'll admit I didn't fully understand what a lot of this code was doing, and I probably still don't fully get it. But now that I've more closely examined this loop, and the bytestring_less_or_equal
function, I think I get why the break is there.
I think I've found a simpler solution that shouldn't slow down the runtime much if at all. What if we modify the bytestring_less_or_equal
function so that space and group-separator character are both treated as "group separators"? I've made a commit as a demo, and it seems to work well for me: https://github.com/Nitemice/spotify-folders/commit/974692e32779c3c8e40a18be10b6b0ddeb4d406f
have two search keys. On macOS seek the first one first, on Windows the other one first. Then flip if nothing is found.
I think this is a reasonable solution, but if we start finding even more alternatives, we may need to rethink it (maybe in combination with the solution above?).
Only issue with this is I tried running your updated code, and it wouldn't work for me. Apparently there isn't a reverse()
function for tuples, so I had to turn the key_templates
from a tuple to a list.
--- key_templates = (LEVELDB_ROOTLIST_KEY_1, LEVELDB_ROOTLIST_KEY_2)
+++ key_templates = [LEVELDB_ROOTLIST_KEY_1, LEVELDB_ROOTLIST_KEY_2]
make a regular (fast) search; if that fails, then make the slower search.
Like I mentioned, I think the above solution is cleaner and simpler. But it's really up to you.
If you don't know how to output the keys in a ldb file (in their stored order), I can write you a script.
I spent a large chunk of time over the last few days trying to dump the ldb various ways, but nothing has worked for me on Windows. If you've got a script that you think'll work (and doesn't depend on a dozen extra things), I'd be happy to give it a go.
I've noticed the pattern of the keys here seem to be:
Yeah, I think PREFIX {SPACER} SUFFIX
is accurate, although each section can vary wildly. The details that seem most fixed to me are:
I've also seen the following as spacers:
I think I've found a simpler solution that shouldn't slow down the runtime much if at all. What if we modify the
bytestring_less_or_equal
function so that space and group-separator character are both treated as "group separators"? I've made a commit as a demo, and it seems to work well for me: Nitemice@974692e
I like simple solutions! :) In this specific case, this seems a bit speculative for me (and prone to false negatives), since we don't quite know yet what that "SPACER" symbol means. (Including its usage in greenbase.KeyComparator
.) There may be times where it may have another function. I'd opt to be more conservative here and do the "slow search" until we have more clarity on greenbase.KeyComparator.
have two search keys. On macOS seek the first one first, on Windows the other one first. Then flip if nothing is found.
I think this is a reasonable solution, but if we start finding even more alternatives, we may need to rethink it (maybe in combination with the solution above?). Only issue with this is I tried running your updated code, and it wouldn't work for me. Apparently there isn't a
reverse()
function for tuples, so I had to turn thekey_templates
from a tuple to a list.
๐ This is an error. Thank you. I pushed a fix.
make a regular (fast) search; if that fails, then make the slower search.
Like I mentioned, I think the above solution is cleaner and simpler. But it's really up to you.
I much prefer cleaner and simpler. I see this as a transitory solution until we have clarity on how greenbase.KeyComparator works. Then we can massively clean up the code and only need one attempt.
If you don't know how to output the keys in a ldb file (in their stored order), I can write you a script.
I spent a large chunk of time over the last few days trying to dump the ldb various ways, but nothing has worked for me on Windows. If you've got a script that you think'll work (and doesn't depend on a dozen extra things), I'd be happy to give it a go.
https://gist.github.com/mikez/dcd8cd65319049e434820c3c9459cc4a
No dependencies; except for the folders.py script. Put this into the same directory as that file. Read the description in the beginning for a basic primer on which keys are where in what order.
As for greenbase.KeyComparator: I haven't understood the ordering. Sometimes it seems alphanumeric; at other times, it seems the SPACER symbol has higher weight than the rest, even the characters before it.
In this specific case, this seems a bit speculative for me (and prone to false negatives)
I don't disagree. You're probably right; this is the best we can do for now, until we have a much better handle on what's going on with the key's structure and sorting.
I've run the script you provided (thanks for that!) on the file that contain the 'rootlist' keys, because really, that's the only one we care about. I've attached a anonymised/summarised version of the output. 042129_summ.txt
Basically, it looks to be in alphanumeric order, with group separators (in this case space chars) treated as greater than other chars. We need some people on other platforms to confirm that it's the same for them, and we should be good from there.
Thank you, @Nitemice, this is very helpful. I now figured out the spacer symbol. It encodes the bytelength of the upcoming data. The same pattern seems to be used for all !-keys:
!{prefix}#{len}{data}#{len}{data}#โฆ#"
Sometimes the data can contain #, ! and such symbols too, but the "len" prefix makes that clear. I pushed an update making the code simpler.
Greenbase comparator is still unclear.
Ok, I think I figured out the comparator approximately too now. It's essentially alphanumeric, but ignores the varint {length} sections. Pushed new version. Leaving in slow_search as a backup in case DB gets corrupted.
@Nitemice Can you confirm that this works on your end?
@Nitemice Can you confirm that this works on your end?
Yes, working for me now. Thank you!
Spotify changed the cache storage tech as of 2023-11-30. The code was substantially rewritten. You're invited to test the new setup and report any issues here.
So far, I've tested it on one macOS machine. More testing is especially needed for Linux and Windows.
If you were able to run this and verify your folder structure, please give a thumbs up here ๐; also, feel free to report the operating system you tested it on.
Installation instructions (unchanged)
Note: If you haven't changed your folder hierarchy recently, it may be stored in a compressed format. In that case, you may receive additional instructions on how to install the "snappy" decompression library. You can circumvent this by making a tiny change to your folder structure (which you can immediately change back again); this will store the folder hierarchy in an uncompressed cache.
(/cc @technomorph, @Nitemice, @inferrinizzard, @chocolateboy, @AlexSWall, @sydoracle, @ShiromMakkad)