sfu-dhil / wilde

eXist/XQuery app for detecting copying in a collection of XHTML documents.
GNU General Public License v3.0
2 stars 9 forks source link

Categorisation / Foldering problem #122

Closed ccolliga closed 3 years ago

ccolliga commented 3 years ago

Describe the bug

On the website, different newspapers that share the same name are being categorised together as the same newspaper.And then when browsing by newspaper, different newspapers that share the same name are being jumbled together.

For example: Le Reveil (Paris) and Le Reveil (Montreal) For example: La Presse (Paris) and La Presse (Montreal)

To Reproduce Steps to reproduce the behavior:

  1. Go to '...' Header Menu, Browse, By Newspaper
  2. Click on '....' Le Réveil
  3. Scroll down to '....' See list of newspapers; see Montreal and Paris news papers grouped together

Expected behavior Individual newspapers should not be grouped with other newspapers that share the same name.

I think I might be able to solve this issue by retitling some of my folders in the wilde repository. In cases where newspapers share the same name, I could add a number to the folder or identify the city. However, I don't want that number or city name to appear on the website. Happy to hear other suggestions for solutions to this problem that I can make on my end.

Screenshots If applicable, add screenshots to help explain your problem.

Capture d’écran, le 2021-04-01 à 16 55 12

Desktop (please complete the following information):

joeytakeda commented 3 years ago

Oh interesting! Good catch—I don't think is a foldering issue, though. Right now, the application relies on the @content for the publisher to create the list:

  <meta content="La Presse" name="dc.publisher" data-sortable="presse" />

But that is, as you note, a big fragile. There is a publisher id field in the reports, however, which looks like it would be a better thing for the application to use for generating the list of reports.

  <meta content="c_lp_61" name="dc.publisher.id" />

That id looks like it is generated from the region and the paper's name, so it should be unique. I don't think the application uses it anywhere, though (as far as I can tell, at least); I'm going to re-assign this @ubermichael since he'll know more about how that id is generated.

ubermichael commented 3 years ago

That publisher ID is generated based on @content in dc.publisher. So if two papers are both called "La Presse" they get the same ID.

I can adjust the processing tools to take the foldering into account, but that may also be fragile.

ccolliga commented 3 years ago

Ok, let me know what is best the best solution.

Dr. Colette Colligan, Professor of English

Simon Fraser University, Canada


De : Michael Joyce @.***> Envoyé : mardi 13 avril 2021 19:53 À : sfu-dhil/wilde Cc : Colette Colligan; Author Objet : Re: [sfu-dhil/wilde] Categorisation / Foldering problem (#122)

That publisher ID is generated based on @content in dc.publisher. So if two papers are both called "La Presse" they get the same ID.

I can adjust the processing tools to take the foldering into account, but that may also be fragile.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/sfu-dhil/wilde/issues/122#issuecomment-818916961, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO5E5ANHKNTVWSSH2OVFPVTTIR56ZANCNFSM42KQU6AQ.

ubermichael commented 3 years ago

Checking up on this, dc.publisher.id is generated from dc.publisher and dc.region, so I think we're safe to use it for grouping the publishers.