Open mdeff opened 4 years ago
Branches:
master
branch contains the code that produced the latest released data. The usage code and documentation can be updated but should work with the released data.next
branch contains the code to produce the hypothetical next release of the data. The usage code is updated to any new data format.outputs
branch is based on master
and contains generated data (e.g., notebook outputs and figures) for convenience (most notably to run on binder).Potential todos for a dataset update:
tracks.csv
, artists.csv
, albums.csv
) instead of a single huge tracks.csv
? Consider standards like JAMS.
Below are issues affecting the
rc1
data release that cannot be fixed without a data update. As updating is disruptive (it'll break code and make results non-comparable), it should be done sparingly, e.g., to fix a fatal flaw or many small ones discovered over time.master
): note in README to try with 7zip (5700859)next
): zip with deflate (instead of bzip2) (#5) or zstd (#32)master
): small subset's list, medium subset's list (#8)next
): metadata from mp3 not API, ensure 30s (8077afe, 00d5b71, 840b337)master
): list (#27)next
): dump ID3 tags with technical metadata and remove from mp3master
): list the 937 duplicatesnext
): remove them (try other methods and detect near duplicates)Workarounds are explained in more details in the wiki.