huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.67k stars 26.44k forks source link

loss is nan, for training on MNLI dataset #1704

Closed antgr closed 4 years ago

antgr commented 4 years ago

❓ Questions & Help

Recently I read a tutorial https://medium.com/tensorflow/using-tensorflow-2-for-state-of-the-art-natural-language-processing-102445cda54a which you can also see in this notebook https://colab.research.google.com/drive/16ClJxutkdOqXjBm_PKq6LuuAuLsInGC-

In this tutorial, MRPC dataset is used.

I changed the dataset from MRPC to MNLI and you can check the changes in this notebook, in the corresponding code cells. https://colab.research.google.com/drive/1mzYkrAW5XUwey4FJIU9SN0Q0RSn0tiDb

with MNLI dataset, I see the following issue (*) : print("Fine-tuning BERT on MRPC") bert_history = bert_model.fit(bert_train_dataset, epochs=3, validation_data=bert_validation_dataset) Fine-tuning BERT on MNLI Epoch 1/3 352/Unknown - 511s 1s/step - loss: nan - accuracy: 0.3416 (*) You can see here that the loss is nan

with MRPC the corresponding output is: print("Fine-tuning BERT on MRPC") bert_history = bert_model.fit(bert_train_dataset, epochs=3, validation_data=bert_validation_dataset) Fine-tuning BERT on MRPC Epoch 1/3 15/Unknown - 44s 3s/step - loss: 0.6623 - accuracy: 0.6183 ```

The only differences that I see in those two tutorials is the following: For MNLI there is a match and mismatch validation datasets, and I provide the label_list=['0', '1', '2'] in glue_convert_examples_to_features

Could someone help me why this issue occurs? Thanks

antgr commented 4 years ago

extra details:

example = list(bert_validation_matched_dataset.iter())[0] example

{'hypothesis': <tf.Tensor: id=4914985, shape=(64,), dtype=string, numpy=
 array([b'The St. Louis Cardinals have always won.',
        b'The fortress was built a number of years after the caravanserai.',
        b'Mastihohoria is a collection of twenty mastic villages built be the genoese.',
        b'Reggae is the most popular music style in Jamaica.',
        b'I am able to receive mail at my workplace.',
        b'Men have lower levels of masculinity than in the decade before now.',
        b'Clinton has several similarities to Whitewater or Flytrap.',
        b'A search has been conducted for an AIDS vaccine.',
        b'We can acknowledge there is fallout from globalization around the world.',
        b'My feelings towards pigeons are filled with animosity.',
        b'She could see through the ghosts with ease.',
        b'Leading organizations want to be sure their processes are successful.',
        b'The Postal Service spends considerable sums on cost analysis.',
        b'Indeed we got away from the original subject.',
        b'Economic growth continued apace, with many people employed by the railroad repair shop.',
        b'Neither side is actually interested in a settlement at this time.',
        b'The rooms are opulent, and used for formal, elegant events.',
        b"The East side of the square is where the Old King's House stands.",
        b'The islands are part of France now instead of just colonies.',
        b'A stoichiometry of 1.03 is typical when the FGD process is not producing gypsum by-product',
        b'You can hire the equipment needed for windsurfing at Bat Galim Beach. ',
        b"There isn't enough room for an airport on the island.",
        b'The setup rewards good business practices.',
        b'People sacrifice their lives for farmers and slaves.',
        b"She doesn't like people like me. ",
        b"It's nothing like a drug hangover.",
        b'The philosophy was to seize opportunities when the economy is doing poorly.',
        b"Bill Clinton isn't a rapist",
        b'Various episodes depict that he is a member.',
        b"Bellagio's water display was born from this well received show.",
        b'Fannie Mae had terrible public-relations.',
        b"Gododdin's accomplishments have been recorded in a Welsh manuscript.",
        b'I can imagine how you are troubled by insects up there',
        b'Howard Berman is a Democrat of the House.',
        b'Gore dodged the draft.', b"Jon was glad that she wasn't. ",
        b'Section 414 helps balance allowance allocations for units.',
        b'Reducing HIV is important, but there are also other worthy causes.',
        b'I think there are some small colleges that are having trouble.',
        b'The best hotels in the region are in Hassan.  ',
        b'She mentioned approaching the law with a holistic approach/',
        b"Select this option for Finkelstein's understanding of why this logic is expedient.",
        b"It's impossible to have a plate hand-painted to your own design in Hong Kong.",
        b"We could annex Cuba, but they wouldn't like that.",
        b'She really needs to mention it',
        b"The basics don't need to be right first.",
        b'Standard Costing was applied to the ledger.',
        b'The exhibition was too bare and too boring. ',
        b'The uncle had no match in administration; certainly not in his inefficient and careless nephew, Charles Brooke.',
        b'The Legacy Golf Club is just inside city limits.',
        b'They do not give money to legal services.',
        b'Do you want some coffee?',
        b'In 1917, the Brittish General Allenby surrendered the city using a bed-sheet.',
        b'Daniel explained what was happening.',
        b'That never happened to me.', b'You would have a prescription.',
        b"It's lovely speaking with you. ",
        b'Each Pokemon card pack is filled with every rare card a kid could want.',
        b'He generally reports very well on all kinds of things.',
        b'The final rule was declared not to be an economically significant regulator action.',
        b'Dana, this conversation bored me.',
        b'Andratx is on the northwest coast and the Cape of Formentor is further east.',
        b'Sunblock is an unnecessary precaution if you are in the water.',
        b'U.S. consumers and factories in East Asia benefit from imports.'],
       dtype=object)>,
 'idx': <tf.Tensor: id=4914986, shape=(64,), dtype=int32, numpy=
 array([3344, 3852, 5009, 5398, 2335,  647, 7823, 8927, 2302, 4800, 8628,
         637, 7756, 2189, 3146, 8990, 4759, 2592,   96, 5144, 2373, 7698,
        2862, 1558, 7639, 3860,  416, 5768, 9299, 3149, 2927, 5914, 4960,
        2880, 8203, 7787, 7556, 6465, 9781, 4053, 1217, 7178,   39, 8885,
        6666, 8157, 3995, 1758, 5552, 4476, 3325, 7537, 7940, 8409, 7899,
        4104, 2874, 4845, 3934, 5351, 2982, 5235, 2614, 6318], dtype=int32)>,
 'label': <tf.Tensor: id=4914987, shape=(64,), dtype=int64, numpy=
 array([2, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 2, 0, 2, 0, 1,
        2, 1, 1, 2, 2, 1, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 2, 0,
        2, 2, 1, 2, 2, 2, 2, 2, 2, 0, 2, 0, 0, 2, 1, 2, 2, 1, 2, 0])>,
 'premise': <tf.Tensor: id=4914988, shape=(64,), dtype=string, numpy=
 array([b"yeah well losing is i mean i'm i'm originally from Saint Louis and Saint Louis Cardinals when they were there were uh a mostly a losing team but",
        b'Beside the fortress lies an 18th-century caravanserai, or inn, which has been converted into a hotel, and now hosts regular folklore evenings of Turkish dance and music.',
        b'The twenty mastic villages known collectively as mastihohoria were built by the Genoese in the 14 15th centuries.',
        b'Jamaican music ska and, especially, reggae has since the 1970s been exported and enjoyed around the world.',
        b'for me now the address is the same you know my my office address',
        b'[W]omen mocking men by calling into question their masculinity is also classified as sexual harassment, the paper added.',
        b"Watergate remains for many an unhealed wound, and Clinton's critics delight in needling him with Watergate comparisons--whether to Whitewater or Flytrap.",
        b'The search for an AIDS vaccine currently needs serious help, with the U.S. government, the biggest investor in the effort, spending less than 10 percent of its AIDS-research budget on the problem.',
        b'First, we can acknowledge, and maybe even do something about, some of the disaffecting fallout from globalization, such as pollution and cultural dislocation.',
        b'I hate pigeons.',
        b'From that spot she could see all of them and, should she need to, she could see through them as well.',
        b'We also have found that leading organizations strive to ensure that their core processes efficiently and effectively support mission-related outcomes.',
        b'Also, considerable sums are spent by the Postal Service analyzing the costs associated with worksharing, and mailers/competitors incur considerable expense litigating their positions on worksharing before the Postal Rate Commission.',
        b'yeah well we veered from the subject',
        b'Growth continued for ten years, and by 1915 the town had telephones, round-the-clock electricity, and a growing population many of whom worked in the railroad repair shop.',
        b"And if, as ultimately happened, no settlement resulted, we could shrug our shoulders, say, 'Hey, we tried,' and act like unsuccessful brokers to an honorable peace.",
        b'Lavishly furnished and decorated, with much original period furniture, the rooms are used for ceremonial events, visits from foreign dignitaries, and EU meetings.',
        b"On the west side of the square is Old King's House (built in 1762), which was the official residence of the British governor; it was here that the proclamation of emancipation was issued in 1838.",
        b'All of the islands are now officially and proudly part of France, not colonies as they were for some three centuries.',
        b'8 A stoichiometry of 1.03 is typical when the FGD process is producing gypsum by-product, while a stoichiometry of 1.05 is needed to produce waste suitable for a landfill.',
        b' The equipment you need for windsurfing can be hired from the beaches at Tel Aviv (marina), Netanya, Haifa (at Bat Galim beach), Tiberias, and Eilat.',
        b'Since there is no airport on the island, all visitors must arrive at the port, Skala, where most of the hotels are located and all commercial activity is carried out.',
        b'The entire setup has an anti-competitive, anti-entrepreneurial flavor that rewards political lobbying rather than good business practices.',
        b'Why bother to sacrifice your lives for dirt farmers and slavers?',
        b'She hates me."',
        b'and the same is true of the drug hangover you know if you',
        b'In the meantime, the philosophy is to seize present-day opportunities in the thriving economy.',
        b'Most of the Clinton women were in their 20s at the time of their Clinton encounter',
        b'On various episodes he is a member, along with Bluebeard and the Grim Reaper, of the Jury of the Damned; he takes part in a snake-bludgeoning (in a scandal exposed by a Bob Woodward book); his enemies list is used for dastardly purposes; even his dog Checkers is said to be bound for hell.',
        b'This popular show spawned the aquatic show at the Bellagio.',
        b"Not surprisingly, then, Fannie Mae's public-relations operation is unparalleled in Washington.",
        b'Little is recorded about this group, but they were probably the ancestors of the Gododdin, whose feats are told in a seventh-century Old Welsh manuscript.',
        b'i understand i can imagine you all have much trouble up there with insects or',
        b'Howard Berman of California, an influential Democrat on the House International Relations Committee.',
        b'An article explains that Al Gore enlisted for the Vietnam War out of fealty to his father and distaste for draft  Gore deplored the inequity of the rich not having to serve.',
        b"I am glad she wasn't, said Jon.",
        b'If necessary to meeting the restrictions imposed in the preceding sentence, the Administrator shall reduce, pro rata, the basic Phase II allowance allocations for each unit subject to the requirements of section 414.',
        b"Second, reducing the rate of HIV transmission is in any event not the only social goal worth  If it were, we'd outlaw sex entirely.",
        b"yes well yeah i am um actually actually i think that i at the higher level education i don't think there's so much of a problem there it's pretty much funded well there are small colleges that i'm sure are struggling",
        b'The most comfortable way to see these important Hoysala temples is to visit them on either side of an overnight stay at Hassan, 120 km (75 miles) northwest of Mysore.',
        b'We saw a whole new model develop - a holistic approach to lawyering, one-stop shopping, she said. ',
        b"Click here for Finkelstein's explanation of why this logic is expedient.",
        b'In Hong Kong you can have a plate, or even a whole dinner service, hand-painted to your own design.',
        b"of course you could annex Cuba but they wouldn't like that a bit",
        b'She hardly needs to mention it--the media bring it up anyway--but she invokes it subtly, alluding (as she did on two Sunday talk shows) to women who drive their daughters halfway across the state to shake my hand, a woman they dare to believe in.',
        b'First, get the basics right, that is, the blocking and tackling of financial reporting.',
        b'STANDARD COSTING - A costing method that attaches costs to cost objects based on reasonable estimates or cost studies and by means of budgeted rates rather than according to actual costs incurred.',
        b'NEH-supported exhibitions were distinguished by their elaborate wall panels--educational maps, photomurals, stenciled treatises--which competed with the objects themselves for space and attention.',
        b'More reserved and remote but a better administrator and financier than his uncle, Charles Brooke imposed on his men his own austere, efficient style of life.',
        b'Also beyond city limits is the Legacy Golf Club in the nearby suburb of Henderson.',
        b'year, they gave morethan a half million dollars to Western Michigan Legal Services.',
        b"'Would you like some tea?'",
        b'On a December day in 1917, British General Allenby rode up to Jaffa Gate and dismounted from his horse because he would not ride where Jesus walked; he then accepted the surrender of the city after the Ottoman Turks had fled (the flag of surrender was a bed-sheet from the American Colony Hotel).',
        b'Daniel took it upon himself to explain a few things.',
        b'yep same here',
        b"because then they'll or you have a prescription",
        b"well it's a pleasure talking with you",
        b'By seeding packs with a few high-value cards, the manufacturer is encouraging kids to buy Pokemon cards like lottery tickets.',
        b"He reported masterfully on the '72 campaign and the Hell's Angels.",
        b'The final rule was determined to be an economically significant regulatory action by the Office of Management and Budget and was approved by OMB as complying with the requirements of the Order on March 26, 1998.',
        b"well Dana it's been really interesting and i appreciate talking with you",
        b'The dramatic cliffs of the Serra de Tramuntana mountain range hug the coastline of the entire northwest and north, from Andratx all the way to the Cape of Formentor.',
        b'Keep young skins safe by covering them with sunblock or a T-shirt, even when in the water.',
        b'In the short term, U.S. consumers will benefit from cheap imports (as will U.S. multinationals that use parts made in East Asian factories).'],
       dtype=object)>}

and example1 = list(bert_train_dataset.iter())[0] example1

({'attention_mask': <tf.Tensor: id=4915606, shape=(32, 128), dtype=int32, numpy=
  array([[1, 1, 1, ..., 0, 0, 0],
         [1, 1, 1, ..., 0, 0, 0],
         [1, 1, 1, ..., 0, 0, 0],
         ...,
         [1, 1, 1, ..., 0, 0, 0],
         [1, 1, 1, ..., 0, 0, 0],
         [1, 1, 1, ..., 0, 0, 0]], dtype=int32)>,
  'input_ids': <tf.Tensor: id=4915607, shape=(32, 128), dtype=int32, numpy=
  array([[  101,  1105,  1128, ...,     0,     0,     0],
         [  101,  1448,  2265, ...,     0,     0,     0],
         [  101, 17037, 20564, ...,     0,     0,     0],
         ...,
         [  101,   178,  1274, ...,     0,     0,     0],
         [  101,  6249,  1107, ...,     0,     0,     0],
         [  101,   146,  1354, ...,     0,     0,     0]], dtype=int32)>,
  'token_type_ids': <tf.Tensor: id=4915608, shape=(32, 128), dtype=int32, numpy=
  array([[0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         ...,
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0]], dtype=int32)>},
 <tf.Tensor: id=4915609, shape=(32,), dtype=int64, numpy=
 array([1, 0, 1, 1, 0, 1, 0, 0, 2, 0, 0, 0, 2, 2, 2, 1, 0, 1, 2, 0, 2, 2,
        0, 1, 0, 1, 1, 2, 1, 1, 2, 2])>)
antgr commented 4 years ago

In the above seems that bert_validation_matched_dataset 's format is wrong. I would expect to be similar to bert_train_dataset. bert_validation_matched_dataset is produced with the following code: bert_validation_matched_dataset = glue_convert_examples_to_features(validation_matched_dataset, bert_tokenizer, 128, 'mnli', label_list=['0', '1', '2']) bert_validation_matched_dataset = validation_matched_dataset.batch(64)

Any idea why that didn't work?

antgr commented 4 years ago

OK, I found out. bert_validation_matched_dataset = glue_convert_examples_to_features(validation_matched_dataset, bert_tokenizer, 128, 'mnli', label_list=['0', '1', '2']) bert_validation_matched_dataset = validation_matched_dataset.batch(64) I have to write bert_validation_matched_dataset there