riwanahas / runreco

MIT License
1 stars 0 forks source link

Midterm feedback #29

Open miguelgfierro opened 6 months ago

miguelgfierro commented 6 months ago
juan-yu commented 6 months ago

@miguelgfierro the reason adding the code of the million songs is for reading the .h5 file because that file has a special format, and using the getters the creator of the dataset provides is more convinient. It's python2, very old, but just need to slighty edit it and we can use. Faster than writing our own getters from scratch.

miguelgfierro commented 6 months ago

@lgljht90 have you tried h5py?

juan-yu commented 6 months ago

@lgljht90 have you tried h5py?

Yes I tried to use h5py to iterate rows, but this dataset seems to have a special structure, different from normal.h5 data, so I ended up using the creator's getters. I consulted the team using this dataset last year, and he advised it's not worth it to spend time on the structure of the million songs dataset. May I know the reason that professor you suggest creating our getters from scratch? Could it improve the performance? It reads really slow now.

Thank you!