manga-download / haruneko

Prototype of HakuNeko based on NW.js + TypeScript
https://haruneko-docs.pages.dev
142 stars 32 forks source link

Duplicate Mangas/chapters ID handling #112

Open MikeZeDev opened 1 year ago

MikeZeDev commented 1 year ago

Case : https://raw.senmanga.com/mou-hatarakitakunaindesu-boukensha-nanka-yamete-yaru-imasara-taiguu-wo-kaerukara-to-onegai-sarete-mo-okotowaridesu-boku-wa-zettai-hatarakimasen

Its not caused by a bad CSS selector in Haru. There are 2 chapters 3.1, with the same id.

When selecting the manga it cause errors, spinning stays, and subsequent click on others mangas keeps the spinning thingie, chapters are added BELOW IT

image

image

MikeZeDev commented 1 year ago

I see no reasons why a chapter id would be duplicated. Different language but same ID? Website must be able to differentiate what to load therefore Haru plugin should compose an identifier using id and language to make it unique anyway.

1) Would it be doable to filter dupes by default? 2) Or since it an edge case UI should at least handle it properly when we switch to another manga.

Sheepux commented 1 year ago

Oh interresting unit test that i should add in SAM Is that an engine issue or an UI issue ? (going to test)

MikeZeDev commented 1 year ago

Error is throwed in node_modules/svelte/internal/index.mjs.

Is that an engine issue or an UI issue ? (going to test)

Both i guess. Error occurs because engine dont filter duplicate chapters, but UI should have be cleaned after we switched manga.

Sheepux commented 1 year ago

That's because i'm using a unique key identifier: https://github.com/manga-download/haruneko/blob/master/web/src/frontend/classic/components/MediaItemSelect.svelte#L244 {#each filteredItems as item (item.Identifier)} looks like that having the same item.Identifier is a big issue for svelte (seems logicical tho)

MikeZeDev commented 1 year ago

You're not wrong, its the website that is wrong ofc. Nonetheless, maybe something can be done to clean MedialistItem list UI (or whatever you call it) when changing manga (like, force remove spinner??)

Or remove spinner in case of error when fetching chapters? not sure if this one is easily catchable.

Sheepux commented 1 year ago

The only way to not get the internal svelte to crash is to dedupe the list. This is something that MUST be dore by the engine.

ronny1982 commented 1 year ago

This is an easy fix, but there are some open questions:

image
MikeZeDev commented 1 year ago

If the website shows the same chapter twice, should HaruNeko do the same?

I think the answer is obvious. If the chapter id is the same, it means links leads to the same resource. Even if the link set a cookie for language and link to the same chapter (url) but displayed differently according to the cookie then it should be treated as 2 different chapters, Haruneko should generate 2 different ID. If its the same ID, then those are identical, there is no reasons to display 2.

If duplicate chapters are to be filtered out, which chapter shall be kept in case the title is different?

We are talking about identical ID, referencing the same resource. In other words we are creating a workaround for a website error. I see no harm in arbitrarily choosing any of them, being the first or the second, as long as we stick to our choice. "Keep the longest title" is not a really good criteria after all.

Sheepux commented 1 year ago

This is an easy fix, but there are some open questions:

  • If the website shows the same chapter twice, should HaruNeko do the same?
  • If duplicate chapters are to be filtered out, which chapter shall be kept in case the title is different? We have 2 issues, Titles and Identifiers (i'm using the identifier as key, but perhaps that's wrong)

T = Title, L = Link

T1/L1 + T1/L1 = Same

Ignore duplicate (show only 1)

T1/L1 + T1/L2 = different target but cannot have same name (would create the same folder).

Append "(x)" (x being the current count of duplicates of the same title)

T1/L1 + T2/L1 = same target but different name.

Leave it as it is (yes it would duplicate the content) but that's up to the website. But that means that I shouldn't be using the identifier as key but the name ?

MikeZeDev commented 1 year ago

Case : https://w31.holymanga.net/manga-list/page-331/

image

It ends the Manga list loop 🤷

A whole page of dupe manga

MikeZeDev commented 1 year ago

T1/L1 + T1/L1 = Same Ignore duplicate (show only 1) 👍

T1/L1 + T1/L2 = different target but cannot have same name (would create the same folder). Append "(x)" (x being the current count of duplicates of the same title)

Those will get unique id anyway, so its not a filtering problem but a download sanitizing problem Therefore its not a plugin issue but something to be settled in download manager.

And ofc there is the issue of recognizing duplicate folders like that in UI, since we display downloaded folders i think, idk? Also users can different name convention when downloading :D

T1/L1 + T2/L1 = same target but different name. Leave it as it is (yes it would duplicate the content) but that's up to the website. But that means that I shouldn't be using the identifier as key but the name ?

Ultimately if target is the same, its the same content. I see no use case where the content behind would be different with the same url. If its different content (like, url is the same but there is a dataset parameter on the link for different language idk, and script put a cookie idk before redirecting) then the plugin have to generate an unique identifier no matter what aka JSON.stringify(id, title, language, whatever) as unique ID.

MikeZeDev commented 1 year ago

I suppose we can code a default filtering after FetchMangas & FetchChapters in the engine?

Something that could be overwritten for particular websites.

As decorator functions maybe? idk

My point is :

ronny1982 commented 1 year ago

That's because i'm using a unique key identifier: https://github.com/manga-download/haruneko/blob/master/web/src/frontend/classic/components/MediaItemSelect.svelte#L244 {#each filteredItems as item (item.Identifier)} looks like that having the same item.Identifier is a big issue for svelte (seems logicical tho)

Is it possible to use the item (object reference) itself as identifier instead of its ID?

{#each filteredItems as item (item)}
ronny1982 commented 1 year ago

There are other places as well, e.g. in Bookmarks.

It is a valid use case to have duplicate Identifiers in bookmarks (a user may have the same manga bookmarked from different websites which are using the same path style, such as seen in various WordPressMangaStream websites).

Sheepux commented 1 year ago

Is it possible to use the item (object reference) itself as identifier instead of its ID?

Looks like it's valid (tested it quickly on the tutorial) https://learn.svelte.dev/tutorial/keyed-each-blocks

You can use any object as the key, as Svelte uses a Map internally — in other words you could do (thing) instead of (thing.id). Using a string or number is generally safer, however, since it means identity persists without referential equality, for example when updating with fresh data from an API server.