microsoft / fadtk

A simple library for Fréchet Audio Distance (FAD) calculation
MIT License
142 stars 21 forks source link

More lines than songs in fma_pop_tracks.csv #16

Closed PabloPeso closed 9 months ago

PabloPeso commented 9 months ago

Hi,

I had some issues retrieving the tracks from FMA used by FMA-Pop. The main issue is that some of the lines in fma_pop_tracks.csv are linebreaks, this is, what should be a single line is split into 2 or more lines.

For example, in https://github.com/microsoft/fadtk/blob/main/datasets/fma_pop_tracks.csv, the lines 13, 14 and 15 seem to belong to line 12. Is that correct?

I found that almost 600 lines are extra (linebreaks or empty like line 118).

Is this intended?

Thanks,

hykilpikonna commented 9 months ago

Yes, this is how the CSV standard escapes special characters such as line breaks.

For example, if a row contains the text Hello\n\nWorld, a proper CSV would not escape the newlines but rather use quotation marks to wrap around them:

image

Please don't load a CSV line-by-line, that's not how they're designed to be read. You can use a proper CSV library or Pandas.