quetzai / wikimodel

Automatically exported from code.google.com/p/wikimodel
0 stars 0 forks source link

MediaWikiScanner#docElements() raises unexpected ParseException when encountering <D_TABLE_CAPTION> tokens #185

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
*What steps will reproduce the problem?
1. Process the page provided as an attachment

*What is the expected output? What do you see instead?
We were not able to locate precisely the cause of this error therefore we were 
not able to patch it. 
We still found out a couple things listed below :
- The grammar reference file doesn't mention Table captions anywhere.
- The getTABLE_CAPTION() method isn't called at all in the MediaWikiScanner 
class
- In the docElements() method, when encountering a D_TABLE_CAPTION token, we 
don't step in 
the table() method. This is probably why the token isn't consumed, leading to 
the call to 
jj_consume_token(-1) that generates the exception.

*What version of the product are you using? On what operating system?

We use the latest version checked out from the SVN repository. We use it on MAC 
OSX Snow 
Leopard with the 1.6 JVM.

*Please provide any additional information below.

The input data is an extract of the french Wikipedia export collected through 
MWDumper. We use 
the WEM component of your project in order to generate a CAS data structure to 
be supplied to 
the Apache UIMA framework. We encounter the problem with a particular page 
supplied as an 
attachement.

Original issue reported on code.google.com by Maxime.B...@gmail.com on 4 Jun 2010 at 3:31

Attachments:

GoogleCodeExporter commented 8 years ago
A much simpler example, causing the same bug

Original comment by mki...@portolancs.com on 5 Aug 2010 at 10:01

Attachments:

GoogleCodeExporter commented 8 years ago
This is the Exception thrown, when parsing the "simple_table_with_caption.txt" 
example

Original comment by mki...@portolancs.com on 5 Aug 2010 at 10:03

Attachments:

GoogleCodeExporter commented 8 years ago
This patch fixes the support for the D_TABLE_CAPTION token problem.
A JUnit test is also included.

After applying the patch you have to run "RebuildScanners.launch" in order
to rebuild all the Scanners.

Original comment by mki...@portolancs.com on 5 Aug 2010 at 10:07

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks a lot Maxime, I will test your patch right now.

Original comment by thomas.m...@gmail.com on 5 Aug 2010 at 12:18

GoogleCodeExporter commented 8 years ago
/me is not called Maxime, but You're welcome ;-)

Original comment by mki...@portolancs.com on 5 Aug 2010 at 12:23

GoogleCodeExporter commented 8 years ago
Indeed i only looked at the first name and tough is one the same one in each 
message. Sorry about that ;)

Original comment by thomas.m...@gmail.com on 5 Aug 2010 at 12:30

GoogleCodeExporter commented 8 years ago
Patch applied and committed without any modification. Thanks.

Original comment by thomas.m...@gmail.com on 5 Aug 2010 at 12:31

GoogleCodeExporter commented 8 years ago

Original comment by thomas.m...@gmail.com on 5 Aug 2010 at 12:35

GoogleCodeExporter commented 8 years ago
Sorry, but my patch was 'cross project'.
Could you please check, that 'MediaWikiParserTest' is in the correct project.
I think it's better to put it into "org.wikimodel.wem.test".

Original comment by mki...@portolancs.com on 5 Aug 2010 at 12:51

GoogleCodeExporter commented 8 years ago
All the tests actually are in org.wikimodel.wem since a very long time now, I 
think org.wikimodel.wem.test is more a leftover. Also theses unit test are here 
to validate the parser which is in org.wikimodel.wem so i don't see what could 
bring to put it in org.wikimodel.wem.test.

Original comment by thomas.m...@gmail.com on 5 Aug 2010 at 12:59

GoogleCodeExporter commented 8 years ago
All right. Thanks.

Original comment by mki...@portolancs.com on 5 Aug 2010 at 1:15

GoogleCodeExporter commented 8 years ago
<offtopic>
Hi Thomas,
I've made one more patche for wikimodel and put them into an issue.
I've even some more in my workspace (ex. inline macro support) but
its getting harder to separate them. And I want to avoid huge monster patches.
Do you have any chance to commit them?
Thanks Martin
</offtopic>

Original comment by mki...@portolancs.com on 10 Aug 2010 at 11:22