searope / jwpl

Automatically exported from code.google.com/p/jwpl
0 stars 0 forks source link

escape is needed in sql parser #102

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Some entities which contain "'" are absent in the page_inlink.txt file.

This is because when these entities are parsed from the "XXX-pagelinks.sql" 
file, the "\" mark is escaped. For example, 
"Women\'s_National_Basketball_Association" is parsed as 
"Women's_National_Basketball_Association". However, these entities in the 
pNamePageIdMap still contain "\". So that these entities failed in the test:
if (skipPage && !pPageIdNameMap.containsKey(pl_from)|| 
!pNamePageIdMap.containsKey(pl_to))
in line: 273 of SingleDumpVersionOriginal.java. So they are not written in the 
page_inlinks.txt file.

I suggest to modify line 66 in PagelinksParser.java

plTo = st.sval;

into

plTo = SQLEscape.escape(st.sval);

and line 71 in CategorylinkParser.java

clTo = st.sval;

into

clTo = SQLEscape.escape(st.sval);

Original issue reported on code.google.com by astronau...@gmail.com on 3 Sep 2012 at 12:18

GoogleCodeExporter commented 9 years ago
Thank you. That makes sense. I'll look into that.

Original comment by oliver.ferschke on 3 Sep 2012 at 12:42

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r679.

Original comment by oliver.ferschke on 3 Sep 2012 at 12:48

GoogleCodeExporter commented 9 years ago

Original comment by oliver.ferschke on 3 Sep 2012 at 12:49

GoogleCodeExporter commented 9 years ago

Original comment by oliver.ferschke on 11 Sep 2014 at 1:36