Closed GoogleCodeExporter closed 8 years ago
Valami minimális heurisztika arra, hogy több közül melyiket mutatja meg.
Original comment by bpgergo
on 2 Oct 2009 at 5:46
ez készen van,
két részből áll, egyrészt Bisen.java updateHashCode
másrészt controll_harness.py flagDuplicates
még egy bug van benne, hogy a "-" karaktert nem szűri ki
pl:
http://kozel.mokk.bme.hu:8080/hunglish/search?huSentence=csod%C3%A1latos&enSente
nce=beautiful&doc.genre=-10
mysql> select * from bisen where id in (1291330, 1332913);
+---------+---------+-----------+--------------+---------------+------------+---
----------+---------+------+------------------+------------------+--------------
+---------------------+-----------+----------+
| id | version | downvotes | en_sentence | hu_sentence | is_indexed |
line_number | upvotes | doc | en_sentence_hash | hu_sentence_hash |
is_duplicate | indexed_timestamp | copyright | approved |
+---------+---------+-----------+--------------+---------------+------------+---
----------+---------+------+------------------+------------------+--------------
+---------------------+-----------+----------+
| 1291330 | 1 | NULL | Beautiful. | - Csod�latos. | |
977 | NULL | 317 | -625700480 | -1975168042 |
| 2011-01-20 15:12:52 | C | N |
| 1332913 | 1 | NULL | - Beautiful. | - Csod�latos. | |
73 | NULL | 373 | 1272378659 | -1975168042 |
| 2011-01-20 15:27:32 | C | N |
+---------+---------+-----------+--------------+---------------+------------+---
----------+---------+------+------------------+------------------+--------------
+---------------------+-----------+----------+
megoldási javaslat:
Bisen.java stripPunctuation method-ot kell javítani
most ezt használja:
http://download.oracle.com/javase/1.5.0/docs/api/java/lang/Character.html#isLett
erOrDigit(char)
Original comment by bpgergo
on 23 Jan 2011 at 6:50
most már teljes mértékben a harness csinálja a duplum-szűrést
Original comment by bpgergo
on 1 Mar 2011 at 5:41
Original comment by Varga.Da...@gmail.com
on 1 Mar 2011 at 5:54
Original issue reported on code.google.com by
bpgergo
on 2 Oct 2009 at 5:46