locdb / locdb-frend

Fr(ont-)end for the Linked Open Citation Database.
https://locdb.github.io
GNU General Public License v3.0
6 stars 2 forks source link

Internal suggestions iffy #261

Open LauraErhard opened 6 years ago

LauraErhard commented 6 years ago

We know you are busy with the remodeling of our system but we encountered a problem with the internal suggestions. In the last few articles we worked on, we noticed that references in the reference list looked familiar but didn’t come up in the internal suggestions. We then checked older articles to be sure and noticed that the references were indeed linked in the previous articles. We then tried out different search terms and sooner or later we found the right suggestion. We probably don’t notice every time if we linked a reference before and can’t try different search strings for every reference. To avoid potentially creating duplicates we are relying on a working internal search.

Do you have any idea why this happens? Is it because the database gets bigger? Is the internal search not working correctly? Is there a plan to fine-tune the search and can we help in any way?

In addition, it would be very helpful if we could use the browse function to check which references the respective internal suggestions are linked to. If a duplicate occurs we would have the opportunity to check the linked references again for possible errors.

lgalke commented 6 years ago

Thanks for the feedback. Considering the search functionality, @anlausch can probably help. I also noticed that it is quite strict.

Considering the feature request: so you want to know which other resources point to the current resource? Or the other way round? Or both? We would need to think about that. Out-going edges would be feasible to show already now. On the other hand, in-going edges would probably require also another back-end service.

lgalke commented 6 years ago

related to #215

LauraErhard commented 6 years ago

I think we want the first one. Just to clarify we want to see which references are linked to an internal resource. Sometimes there are small differences in resources that lead to an ambiguous situation where we can't be sure if it was an error in a previous reference list or if the current reference really is a different resource. Another case is that if we find a duplicate we want to make sure that we switch all linked references to one internal resource and not spread them between the two duplicates.

Just an example for the second case, here the internal suggestions show two nearly identical resources (one has the journal issue information and the other one doesn't). screenshot-2018-4-13 loc-db frontend We would commit the internal resource with more information. But the other internal resource has to have at least one linked reference as well which we would like to switch to the one with more information. But at the moment we don't have an easy way to find the reference again to switch it over.

Do you have an easier way how to do this or can you identify this in the backend and switch all linked references to one resource and delete the duplicates automatically?

I don't quite understand what you mean with the out-going or in-going edges but I hope I have clarifed what functionality we would like to have.

lgalke commented 6 years ago

Do you have an easier way how to do this or can you identify this in the backend and switch all linked references to one resource and delete the duplicates automatically?

Not yet, but I agree that it would make sense to have such a functionality.

anlausch commented 6 years ago

Thanks for your feedback. Could you collect the cases here in which the internal search does not return the results you expect? Otherwise I don't really know how to test.

LauraErhard commented 6 years ago

Do you only need the right side (=search field + suggestions) or do you want the article information and reference as well?

LauraErhard commented 6 years ago

Reference: Berger, P.L.: The homeless mind: Modernization and consciousness 1974

Searches Case 1.1: Searching with the automatically generated search string 2018-04-17 internal search_noresults_author_end --> the resource doesn't show up

Case 1.2: Searching with only the title 2018-04-17 internal search_noresults_noauthor --> the resource doesn't show up

Case 1.3: Searching with the author at the front 2018-04-17 internal search_results_withauthorfront --> the resource shows up

Case 1.4: Searching without "the" 2018-04-17 internal search_results_withoutthe --> the resource shows up

Reference: DOI: 10.1177/000312240807300502

Searches Case 2.1: Searching with the title 2018-04-17 internal search_results3_7thplace --> the resource shows up, but at the 7th place (you have to expand the list to see the resource)

Reference: DOI: 10.1177/0268580906065299

Searches Case 3.1: Searching with the title 2018-04-17 internal search_5noresults_title --> the resource doesn't show up

Case 3.2: Searching with the author at the end 2018-04-17 internal search_5noresults_authorend --> the resource doesn't show up

Case 3.3: Searching with the author at the front 2018-04-17 internal search_5results_authorfront --> the resource shows up, but at the 2nd place (the first suggestion has nothing in common with the search string, see below) First suggestion: 2018-04-17 internal search_5results_firstplace

Summary: I think we have a few major problems:

  1. "the" , "a", "of"... at the beginning of a string seem to not work very well (this is not consistent, sometimes it works, sometime it doesn't)
  2. Searching with the author at the beginning seems to work best for internal suggestions and crossref suggestions. But SWB suggestions only arrive without the author. The author at the back is not that reliable.
  3. The internal suggestions only shows 10 results. I think you set it this way, to avoid getting large lists, but as the database grows and the search is not yet defined enough, it kicks out our right suggestion.

Our wishes/hopes: a. Maybe instead of putting the author at the end by default, you could put the author at the front of the string? (that would also help with "the", "a" ... at the beginning). To check SWB suggestions we would have to alter the search term anyway and we don't care if we have to delete the author at the back or front. b. If the reference list only holds the DOI, we manually fill in the title. We would love if the DOI would be searchable, but we would be okay, if at least only the title would deliver the right suggestions. On top of that the automatically generated search string for references with only the DOI available is "undefined", that is a little annoying because a search with the term "undefined" always starts before we can put in the real string. Maybe you could just leave the search vacant in this case. c. Maybe allow a little bit more than 10 results for the internal suggestions?

Sorry for the long post, but I hope the examples and my analysis helps with your tests. If you need any more examples, please tell me, I think I can easily find more.

lgalke commented 6 years ago

Thanks, these are concrete requirements. Some of them can also be tackled in the front-end. Do I get it right that almost always you prefer searching only with the title? Then I could put this as the default generated query string. The undefined thing is a small bug that should be resolved, indeed.

lgalke commented 6 years ago

268 created an issue for those adjustments, specifically.

LauraErhard commented 6 years ago

Maybe we need the author to have better search results?! If an author is there maybe it is better to put it at the front of the search string?! But you should probably discuss this with Anne. We are okay either way.

anlausch commented 6 years ago

Thanks for the overview. I am working on it.

lgalke commented 6 years ago

Any updates here? Due to precalculated suggestions, you are now generating a query string in the backend. It would make sense if we also used the same one in the front-end. The cleanest way would probably be an exposed service getQueryStringForEntry(entry). What do you say @anlausch ?