Closed rohanchn closed 1 year ago
In 4.2.0, ketos segtrain and ketos train with -f alto are working as expected.
4.2.0
ketos segtrain
ketos train
-f alto
However, I got an empty text lines exception when I tried to train with a binary dataset. There are indeed a few empty text lines in my dataset.
empty text lines
binary
My command was ketos train --augment -d cuda:0 -f binary --base-dir R --normalization NFD --min-epochs 30 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 -o models/bl/21_uATR all.arrow
ketos train --augment -d cuda:0 -f binary --base-dir R --normalization NFD --min-epochs 30 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 -o models/bl/21_uATR all.arrow
I set this https://github.com/mittagessen/kraken/blob/ecb47081d64eb42fdb66ce344f26576ed54ab480/kraken/lib/dataset.py#L570 to True, and now training with a binary dataset is working as expected.
True
Opening this issue to understand this behavior better.
The semantics of the line skipping flag in the dataset being reversed is a known issue in 4.2.0 and has been fixed in master for a while. I should probably tag a new release.
In
4.2.0
,ketos segtrain
andketos train
with-f alto
are working as expected.However, I got an
empty text lines
exception when I tried to train with abinary
dataset. There are indeed a few empty text lines in my dataset.My command was
ketos train --augment -d cuda:0 -f binary --base-dir R --normalization NFD --min-epochs 30 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 -o models/bl/21_uATR all.arrow
I set this https://github.com/mittagessen/kraken/blob/ecb47081d64eb42fdb66ce344f26576ed54ab480/kraken/lib/dataset.py#L570 to
True
, and now training with abinary
dataset is working as expected.Opening this issue to understand this behavior better.