scijava / scijava-search

Search framework for SciJava applications :mag:
BSD 2-Clause "Simplified" License
2 stars 2 forks source link

Forum search is broken after switch to image.sc #15

Closed haesleinhuepf closed 3 years ago

haesleinhuepf commented 5 years ago

Hey guys,

I'm trying to fix the forum search in ImageJ/Fiji which apparently broke after the forum has moved to image.sc: https://github.com/haesleinhuepf/scijava-search/commit/69478676bff71f23f9e3b158e1abe03612f170f9

It's not just changing the URL. The new forum server detects that our scijava-search is a crawler. I saw that by comparing the HTML source code (1) of the web search result using Chrome with the HTML response our search plugin gets (2). While (1) contains a list of search results, (2) does not. Furthermore, the body tag for example is different:

1:

<body class="crawler">

2:

<body class="">

Is there anybody who can assist me debugging this on server side? Would be nice if we could reactivate the forum search. If not, I would vote for removing it temporarily from the distribution.

Cheers, Robert

ctrueden commented 5 years ago

@haesleinhuepf I suggest you post this question on https://meta.discourse.org. The Discourse team monitors it, and can answer questions about what is going on here, and the best way forward.

imagejan commented 3 years ago

@haesleinhuepf did you make a post on https://meta.discourse.org in the end? If so, can you link it here? It would be nice to get the forum search working again 🙃

haesleinhuepf commented 3 years ago

Hey @imagejan ,

IMHO this is obviously an configuration issue of the discourse system. It worked on the old server but not on the new. Would be cool to compare configuration files. We need someone with access to the config files to debug this. Do you have access to the config files?

Thanks!

Cheers, Robert

haesleinhuepf commented 3 years ago

The crawler settings for example would be interesting to look at: https://github.com/discourse/discourse/blob/e0d9232259f6fb0f76bca471c4626178665ca24a/spec/components/crawler_detection_spec.rb#L22

imagejan commented 3 years ago

The image.sc forum instance is hosted by Discourse, so we don't have direct access to the config files. But in the admin interface of the forum, we have these settings:

image

I just added image.sc to the exclude rel nofollow domains setting. How can I test if that makes a difference? @haesleinhuepf Do you think other settings are required, and if so, can you suggest?

imagejan commented 3 years ago

I quickly browsed some topics on meta.discourse.org, and it seems the best way to submit a search query is using the json-based query:

https://forum.image.sc/search/query.json?term=Threshold

That way we avoid having to parse HTML (which is the way it's currently implemented, right, @haesleinhuepf?).