theCrag / website

theCrag.com: Add your voice and help guide the development of the world's largest collaborative rock climbing & bouldering platform
https://www.thecrag.com/
111 stars 8 forks source link

CSV importer - suboptimal list of candidate routes when the sector/crag/name don't match very well or partially #3697

Open FreddieChopin opened 4 years ago

FreddieChopin commented 4 years ago

What happened?

Try to import this CSV file:

"Ascent Type","Ascent Grade","Route Name","Country","Crag Name","Crag 2","Crag 3","Comment","quality","Ascent Date"
"Pink point","VI.5","Atak Glonów","Poland","Jastrzębnik","-","-","w deszczu (;","Very Good",2015-08-16

There are two "problems" with this entry, in my opinion both are very minor:

The problem with the importer here is that instead of suggesting "Atak glonów - epizod I" on "Jastrzębnik" sector in "Podlesice" crag for this "Atak glonów" route on "Jastrzębnik" crag it suggests some totally crazy routes which are hundred kilometers away in completely different parts of Poland (; It suggests "Atak gibbonów" in Szklarska Poręba or "Atak gibbonów" in "Góry Izerskie".

If I now select "none" and enter "Atak glonów" into the search field (so exactly the same as the line from CSV file), then the correct route is the first entry found, the second one is the extension, only the third and fourth are "Atak gibbonów", which the importer suggested previously.

Screenshot from 2020-06-05 17-58-20

What you expected:

I would expect that the algorithm behind the importer would at least consider "Atak glonów - epizod I" as one of possible matches to "Atak glonów", if not the best one.

Mdemaillard commented 4 years ago

I am not sure if there is use of the common search algorithm in the import matching process, but there are issues in the common search missing hits, see ex #3679.

scd commented 4 years ago

This search should be using fuzzy matching

On Sat., 6 Jun. 2020, 4:52 am Marc dM, notifications@github.com wrote:

I am not sure if there is use of the common search algorithm in the import matching process, but there are issues in the common search missing hits, see ex #3679 https://github.com/theCrag/website/issues/3679.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/theCrag/website/issues/3697#issuecomment-639710542, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC3CQULNMJYZOMRTANYCOLRVE5FXANCNFSM4NUC6PJA .

rouletout commented 3 years ago

@nicHoch was there anything changed since this was reported?

FreddieChopin commented 3 years ago

Please note that since the report I've made some changes to the data on theCrag, so the original example may no longer show the problem. Most importantly I've added the shorter name "Atak glonów" as the AKA (or some other type) to the proper route.