guillaumegenthial / tf_ner

Simple and Efficient Tensorflow implementations of NER models with tf.estimator and tf.data
Apache License 2.0
923 stars 275 forks source link

batch size is creating confusion [ when we compare with Research Paper ] #57

Open ahmadshabbir2468 opened 5 years ago

ahmadshabbir2468 commented 5 years ago

Research Paper Says :

Each batch contains a list of sentences which is determined by the parameter of batch size. In our experiments, we use batch size of 100 which means to include sentences whose total length is no greater than 100 (what i understand Total length is no greater than 100 means : Sentence length including white space should less then 100 i.e this sentence "LSTM CRF" length is 8 as it include 8 character in it ) am i Right?

[](Research Paper links https://arxiv.org/pdf/1508.01991.pdf)

what should we do for long sentence ??

lstm_crf main.py

   params = {
        'dim': 300,
        'dropout': 0.5,
        'num_oov_buckets': 1,
        'epochs': 25,
        'batch_size': 20,
        'buffer': 15000,
        'lstm_size': 100,
        'words': str(Path(DATADIR, 'vocab.words.txt')),
        'chars': str(Path(DATADIR, 'vocab.chars.txt')),
        'tags': str(Path(DATADIR, 'vocab.tags.txt')),
        'glove': str(Path(DATADIR, 'glove.npz'))
    }