microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

[BrainScript] Using frameMode with sequences loaded by HTKFeatureDeserializer #3238

Open timolohrenz opened 6 years ago

timolohrenz commented 6 years ago

Hello CNTK team,

for my reader setup I use the 'HTKFeatureDeserializer' for my reading features from my scp-File. The head of the .scp file looks like this:

0=../../data/TIMIT/feats/TIMIT_16kHz_HTK_FBANK_ENERGY_Timo_NoNORM/white/clean/DR1_FAKS0_SI1573.mfc[0,494]
1=../../data/TIMIT/feats/TIMIT_16kHz_HTK_FBANK_ENERGY_Timo_NoNORM/white/clean/DR1_FAKS0_SI2203.mfc[0,348]
2=../../data/TIMIT/feats/TIMIT_16kHz_HTK_FBANK_ENERGY_Timo_NoNORM/white/clean/DR1_FAKS0_SI943.mfc[0,373]
3=../../data/TIMIT/feats/TIMIT_16kHz_HTK_FBANK_ENERGY_Timo_NoNORM/white/clean/DR1_FAKS0_SX133.mfc[0,329]
4=../../data/TIMIT/feats/TIMIT_16kHz_HTK_FBANK_ENERGY_Timo_NoNORM/white/clean/DR1_FAKS0_SX223.mfc[0,307]
5=../../data/TIMIT/feats/TIMIT_16kHz_HTK_FBANK_ENERGY_Timo_NoNORM/white/clean/DR1_FAKS0_SX313.mfc[0,351]
6=../../data/TIMIT/feats/TIMIT_16kHz_HTK_FBANK_ENERGY_Timo_NoNORM/white/clean/DR1_FAKS0_SX403.mfc[0,333]
7=../../data/TIMIT/feats/TIMIT_16kHz_HTK_FBANK_ENERGY_Timo_NoNORM/white/clean/DR1_FAKS0_SX43.mfc[0,242]

For the input labels I use the CNTKtextFormatDeserializer with my sparse labels having the following form:

0 |l 147:1
0 |l 148:1
0 |l 148:1
0 |l 148:1
0 |l 148:1
0 |l 148:1
0 |l 148:1
0 |l 148:1
0 |l 148:1
0 |l 148:1
0 |l 148:1
0 |l 148:1
0 |l 148:1
0 |l 148:1
0 |l 148:1
0 |l 148:1
0 |l 148:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1
0 |l 149:1

... and so on, with one line per frame. First characters in lines are the corresponding sequence IDs.

For non-recurrent networks I want to run my training in frameMode. For my understanding, all sequences are then split and processed as single frames. However I get the following error message:

[CALL STACK] [0x81d9dc]
[0x7f27a9e1f91f] Microsoft::MSR::CNTK::FramePacker:: CreateMBLayout (std::vector<std::shared_ptr,std::allocator<std::shared_ptr>> const&) + 0xcaf [0x7f27a9e110a0] Microsoft::MSR::CNTK::SequencePacker:: PackSparseStream (std::vector<std::shared_ptr,std::allocator<std::shared_ptr>> const&, unsigned long) + 0x110 [0x7f27a9e10bc4] Microsoft::MSR::CNTK::SequencePacker:: ReadMinibatch () + 0x384 [0x7f27a9e2092e] Microsoft::MSR::CNTK::ReaderBase:: ReadMinibatch () + 0xe [0x7f27a9e0a81f] Microsoft::MSR::CNTK::ReaderShim:: PrefetchMinibatch (unsigned long) + 0xcf [0x7f27a9e0ab53] std::_Function_handler<Microsoft::MSR::CNTK::ReaderShim::PrefetchResult (),std::reference_wrapper<std::_Bind_simple<Microsoft::MSR::CNTK::ReaderShim::StartEpoch(Microsoft::MSR::CNTK::EpochConfiguration const&,std::unordered_set<Microsoft::MSR::CNTK::InputStreamDescription,std::hash,std::equal_to,std::allocator> const&)::{lambda()#3} ()>>>:: _M_invoke (std::_Any_data const&) + 0x13 [0x7f27a9e01c0e] std::_Function_handler<std::unique_ptr<std::future_base::_Result_base,std::future_base::_Result_base::_Deleter> (),std::future_base::_Task_setter<std::unique_ptr<std::future_base::_Result<Microsoft::MSR::CNTK::ReaderShim::PrefetchResult>,std::future_base::_Result_base::_Deleter>,Microsoft::MSR::CNTK::ReaderShim::PrefetchResult>>:: _M_invoke (std::_Any_data const&) + 0x1e [0x974c6e] std::__future_base::_State_base:: _M_do_set (std::function<std::unique_ptr<std::future_base::_Result_base,std::future_base::_Result_base::_Deleter> ()>&, bool&) + 0x1e [0x7f27ab5bde20] pthread_once + 0x50 [0x7f27a9e00a2c] + 0x96fa2c [0x7f27a9e07a0d] std::thread::_Impl<std::_Bind_simple<std::future_base::_Async_state_impl<std::_Bind_simple<Microsoft::MSR::CNTK::ReaderShim::StartEpoch(Microsoft::MSR::CNTK::EpochConfiguration const&,std::unordered_set<Microsoft::MSR::CNTK::InputStreamDescription,std::hash,std::equal_to,std::allocator> const&)::{lambda()#3} ()>,Microsoft::MSR::CNTK::ReaderShim::PrefetchResult>::_Async_state_impl(Microsoft::MSR::CNTK::ReaderShim::StartEpoch(Microsoft::MSR::CNTK::EpochConfiguration const&,std::unordered_set<Microsoft::MSR::CNTK::InputStreamDescription,std::hash,std::equal_to,std::allocator> const&)::{lambda()#3} (&&)())::{lambda()#1} ()>>:: _M_run () + 0x6d [0x7f27a86a72b0] + 0xb52b0 [0x7f27ab5b8e25] + 0x7e25 [0x7f27a7e0f34d] clone + 0x6d EXCEPTION occurred: Detected a non-frame sequence of size 266 in frame mode.

Am I missing some configuration or might this be some missing feature in the readers. Thanks in advance for your help and time.

PS: This is what my reader section looks like:

reader = [
        traceLevel= 2
        randomize = true
        keepDataInMemory = true
        frameMode = true

        # A list of deserializers the reader uses.
        deserializers = (
            [
                type = "HTKFeatureDeserializer"
                module = "HTKDeserializers"
                input = [
                    # Description of input stream to feed the Input node named "features"
                    features = [
                        dim= 1845       # no context for BLSTMs
                        scpFile = "train_files_ctc_len.scp"
                        definesMBSize = true
                    ]
                ]
            ]:
            [
                type = "CNTKTextFormatDeserializer" 
                module = "CNTKTextFormatReader"
                file = "tgts_dnn_train.seq"
                input = [
                    labels = [
                        alias = "l"
                        dim = 183
                        format = "sparse"
                    ]
                ]
            ]
        )
    ]

    cvReader=[
        randomize = true
        crossValidationInterval = 1
        keepDataInMemory = true
        keepDataInMemory = true

        # A list of deserializers the reader uses.
        deserializers = (
            [
                type = "HTKFeatureDeserializer"
                module = "HTKDeserializers"
                input = [
                    # Description of input stream to feed the Input node named "features"
                    features = [
                        dim = 1845
                        scpFile = "dev_files_ctc_len.scp"
                        definesMBSize = true
                    ]
                ]  
            ]:
            [
                type = "CNTKTextFormatDeserializer" 
                module = "CNTKTextFormatReader"
                file = "tgts_dnn_dev.seq"
                input = [
                    labels = [
                        alias = "l"
                        dim = 183
                        format = "sparse"
                    ]
                ]
            ]
        )
    ]           
}
jaliyae commented 6 years ago

What you are hitting is a validation check, which checks for a single sequence per sample. In this case we are getting 266 sequences per sample and therefore the error. Can you make the labels file one line per label?

timolohrenz commented 6 years ago

Hmm, sorry for the bad formatting of my issue. Actually my labels are one label per line as I have corrected now in my former post. Both, label and feature frames have exactly the same lengths.

As I initially guessed this might be an issue with the HTKfeature serializer, I also converted my features to ctf files with also one feature per line. Nevertheless I still get this error. Only workaround that I found was to skip the sequence IDs for both feature and label files.

jaliyae commented 6 years ago

Sure, any possibility for me to get a minimum repro? I can debug it to see what is really happening.

timolohrenz commented 6 years ago

Yeah for sure. Thanks in advance for your efforts.

reproissue#3238.zip

I tested the repro on CNTK binary versions 2.0 and 2.3 both compiled with GPU support. Simply start it with cntk configFile=timit_dnn_conf.cntk

As you will see my data consists of 1101 frames from 4 sequences (=speech utterances) stored in the _featssmall.ctf. Each speech utterance has its own sequence ID. The labels in _tgtssmall.ctf have a parallel structure with one label per line/frame.

Please note the outcommented skipSequenceIds options in the reader section. When skipping IDs, the error won't occur. However, I think this workaround is quite risky as feats might easily get assigned to wrong labels/sequences.

After all it can be a simple misinterpretation of me of the frameMode option, but maybe you can shed some light on it. It would be of great comfort for me to use the same labels and features for sequence and non-sequence trainings.

reproissue#3238.zip

As I said, thanks a lot and keep up the great work on this toolkit!

jaliyae commented 6 years ago

Hi, Thank you for the update. One of my collegue quickly looked at this issue and suggested that we need to set frameMode and truncation outside the reader section.

frameMode = true reader = [ ... ] Could you please give this a try and let me know.

timolohrenz commented 6 years ago

oh, i would have been ashamed if this was the case.

Unfortunately it's not. I tried out the frameMode setting at any possible level and still get the same error. It seems as the reader still treats the input as sequences even though frame Mode is activated.