nlplab / nersuite

http://nersuite.nlplab.org/
Other
26 stars 12 forks source link

nersuite_gtagger crash on very long token #28

Open spyysalo opened 9 years ago

spyysalo commented 9 years ago

Due to preprocessing error, a document contained the line

MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaqadaqaaiabigdaXiabgUcaRmaalaaabaacciGae8xYdCNaeuiLdqKaem4CamhabaGaeuiLdqKaemiDaq3aaSbaaSqaaiabdAgaMbqabaaaaaGccaGLOaGaayzkaaGaemivaq1aa0baaSqaaiabdQgaQbqaaiabd6gaUjabgUcaRiabigdaXaaakmaabmaabaGaemiDaq3aaSbaaSqaaiabd6eaonaaBaaameaacqWG0baDcqWGPbqAcqWGTbqBcqWGLbqzcqWGZbWCaeqaaaWcbeaaaOGaayjkaiaawMcaaiabgkHiTmaalaaabaGae8xYdCNaeuiLdqKaem4CamhabaGaeuiLdqKaemiDaq3aaSbaaSqaaiabdAgaMbqabaaaaOGaemivaq1aa0baaSqaaiabdQgaQbqaaiabd6gaUjabgUcaRiabigdaXaaakmaabmaabaGaemiDaq3aaSbaaSqaaiabd6eaonaaBaaameaacqWG0baDcqWGPbqAcqWGTbqBcqWGLbqzcqWGZbWCaeqaaSGaeyOeI0IaeGymaedabeaaaOGaayjkaiaawMcaaiabg2da9iabgkHiTiabfs5aejabdohaZnaalaaabaGaeyOaIyRaemyraueabaGaeyOaIyRaemivaq1aa0baaSqaaiabdQgaQbqaaiabd6gaUbaakmaabmaabaGaemiDaq3aaSbaaSqaaiabd6eaonaaBaaameaacqWG0baDcqWGPbqAcqWGTbqBcqWGLbqzcqWGZbWCaeqaaaWcbeaaaOGaayjkaiaawMcaaaaacqGHRaWkcqWGubavdaqhaaWcbaGaemOAaOgabaGaemOBa4gaaOWaaeWaaeaacqWG0baDdaWgaaWcbaGaemOta40aaSbaaWqaaiabdsfaujabdAeagbqabaaaleqaaaGccaGLOaGaayzkaaGaeiOla4IaaCzcaiaaxMaadaqadaqaaiabbgeabjabc6caUiabigdaXiabicdaWaGaayjkaiaawMcaaaaa@93A9@

That contains a sequence of 1078 alnum characters.

When attempting to tag this, nersuite_gtagger crashed with a buffer overflow on the invocation of bidir_postag in run.gtagger.cpp.

The input is obviously broken, but nersuite should at least fail gracefully in cases like this, not just crash.