Closed willfrey closed 8 years ago
I've deduced that it's because I'm using a ton of different labels. I'm trying to use this for speech data, so I'm using the label column to store the transcription text.
Do you have any suggestions on a better way to store the text?
Perhaps I could do it by looking up the filenames returned from the sampler?
Okay, that worked!
I created a big json file to store the transcriptions based on the filenames. I can look them up using batch.item[i].url
.
Closing this. :)
I'm trying to split a dataset with 12800 examples across four nodes. Instead of each node receiving 3200 examples, it appears that they receive 0, 12771, 27, and 2, respectively.
Can you help me understand this behavior and try to resolve it?
Thanks.