manami-project / modb-anisearch

This lib contains downloader and converter for downloading raw data from anisearch.com and convert it to an anime object.
GNU Affero General Public License v3.0
4 stars 1 forks source link

Some synonyms are lists of synonyms #53

Closed jilljenn closed 2 years ago

jilljenn commented 2 years ago

Here are the top 5 longest synonyms, with their number of characters:

[(234,
  'Extra One Room: Second Season: Hanasaka Yui wa Tameshite Miru / Hanasaka Yui wa Okurarete Kuru / Nanahashi Minori wa Kaigi Suru / Nanahashi Minori wa Oomono ni Naru / Amatsuki Mashiro wa Neko ni Naru / Amatsuki Mashiro wa Chiryou Suru'),
 (249,
  ', Danmachi: Is It Wrong to Try to Pick Up Girls in a Dungeon? Ist es falsch, im Dungeon in heißen Quellen zu baden?, Dungeon ni Deai wo Motomeru no wa Machigatte Iru Darou ka: Familia Myth - Dungeon ni Onsen wo Motomeru no wa Machigatte Iru Darou ka'),
 (272,
  ', Danmachi: Is It Wrong to Try to Pick Up Girls in a Dungeon? Ist es falsch, auf einer unbewohnten Insel nach Heilkräutern zu suchen?, Dungeon ni Deai wo Motomeru no wa Machigatte Iru Darou ka: Familia Myth II: Mujintou ni Yakusou wo Motomeru no wa Machigatte Iru Darou ka'),
 (290,
  ', Yahari Ore no Seishun Love Comedy wa Machigatteiru.: Kochira to Shite mo Karera Kanojora no Yukusue ni Sachi Ookaran Koto wo Negawazaru wo Enai., Yahari Ore no Seishun Lovecome wa Machigatte Iru.: Kochira to Shite mo Karera Kanojora no Yukusue ni Sachi Ookaran Koto wo Negawazaru wo Enai.'),
 (314,
  ', Inu x Boku Secret Service: Miketsukami‘s Metamorphosis / Switch / Playing House, Inu × Boku Secret Service: Miketsukami-kun‘s Transformations / Switch / Playing House, Inu x Boku Secret Service: Miketsukami-kun‘s Transformations / Switch / Playing House, Inu x Boku SS: Miketsukami-kun Henka / Switch / Omamagoto')]

Perhaps there is a problem with the parsing.

On another note, I managed to find approximately 1419 entries that should be merged, so I guess I won't write one issue per pair. May I send you a list of these by mail? Otherwise I can publish it on one single issue (with an attached file).

manami-project commented 2 years ago

Thank you for reaching out. All of these synonyms result from anisearch.com.

Let's have a look.

Extra One Room: Second Season: Hanasaka Yui wa Tameshite Miru / Hanasaka Yui wa Okurarete Kuru / Nanahashi Minori wa Kaigi Suru / Nanahashi Minori wa Oomono ni Naru / Amatsuki Mashiro wa Neko ni Naru / Amatsuki Mashiro wa Chiryou Suru. To me it looks like the parts separated by / are different named parts within this special. This is kinda weird, but from my perspective it doesn't look incorrect.

, Danmachi: Is It Wrong to Try to Pick Up Girls in a Dungeon? Ist es falsch, im Dungeon in heißen Quellen zu baden?, Dungeon ni Deai wo Motomeru no wa Machigatte Iru Darou ka: Familia Myth - Dungeon ni Onsen wo Motomeru no wa Machigatte Iru Darou ka. Based on the start of this title you can see that there is definitely something wrong.

Same goes for: , Danmachi: Is It Wrong to Try to Pick Up Girls in a Dungeon? Ist es falsch, auf einer unbewohnten Insel nach Heilkräutern zu suchen?, Dungeon ni Deai wo Motomeru no wa Machigatte Iru Darou ka: Familia Myth II: Mujintou ni Yakusou wo Motomeru no wa Machigatte Iru Darou ka.

And for , Yahari Ore no Seishun Love Comedy wa Machigatteiru.: Kochira to Shite mo Karera Kanojora no Yukusue ni Sachi Ookaran Koto wo Negawazaru wo Enai., Yahari Ore no Seishun Lovecome wa Machigatte Iru.: Kochira to Shite mo Karera Kanojora no Yukusue ni Sachi Ookaran Koto wo Negawazaru wo Enai. as well.

For , Inu x Boku Secret Service: Miketsukami‘s Metamorphosis / Switch / Playing House, Inu × Boku Secret Service: Miketsukami-kun‘s Transformations / Switch / Playing House, Inu x Boku Secret Service: Miketsukami-kun‘s Transformations / Switch / Playing House, Inu x Boku SS: Miketsukami-kun Henka / Switch / Omamagoto it seems that we have both cases here. The extraction is having a problem, but there are also named parts within the special.

I'll have to check the tests with these entries to see what is going on in detail.

Theres is just one thing that already stands out to me.

Danmachi: Is It Wrong to Try to Pick Up Girls in a Dungeon? Ist es falsch, im Dungeon in heißen Quellen zu baden?, Dungeon ni Deai wo Motomeru no wa Machigatte Iru Darou ka: Familia Myth - Dungeon ni Onsen wo Motomeru no wa Machigatte Iru Darou ka. I would split it like this:

The comma in the first split could be problematic.

manami-project commented 2 years ago

Looks like I managed to fix this including the edge case.