pwyf / iati-decipher

📊 Browser plugins for deciphering IATI organisation files.
https://iati-decipher.publishwhatyoufund.org
MIT License
1 stars 2 forks source link

Make the dataset search search on more fields #93

Open andylolz opened 6 years ago

andylolz commented 6 years ago

Reported by Matt Geddes on discuss:

[…] because it is so easy to search for a publisher via the browser plugin, I tend to use that rather than going to the registry, but often it means I need to know the ‘correct name’ that the publisher is using. Perhaps this could search a few more fields e.g. the publisher country? For example - to find the German Foreign Ministry file, I tried BMZ, Bundes…, Deutschland, German…and finally ‘Germany’ before I got it - whereas if it could return all publishers based in Germany, it would be easier in many cases.

matmaxgeds commented 6 years ago

Thanks for this @andylolz, let me know if there is anything I can help with - explanations, testing etc

andylolz commented 6 years ago

Great! So the issue is with how I’m using the registry API. The query that’s run at the moment is: https://github.com/pwyf/iati-decipher/blob/ccb6e01ee297031232eca4f2c31a96d2d03df6b7/src/js/popup.js#L11

…so e.g.: https://iatiregistry.org/api/3/action/package_search?fq=extras_filetype:organisation&qf=title&q=germany

So that’s just looking at the title. If you can figure out a better query, let me know! Or if you have a list of the fields that should be searched.

The CKAN docs are here: https://docs.ckan.org/en/2.8/api/index.html The solr docs are here: https://lucene.apache.org/solr/guide/7_3/common-query-parameters.html

matmaxgeds commented 6 years ago

Just a note to say that we are making some progress one this e.g. https://iatiregistry.org/api/3/action/package_search?fq=extras_filetype:organisation&q=BMZ returns files with the search string located outside of the title field - but we are now need to narrow down what it returns e.g. https://iatiregistry.org/api/3/action/package_search?fq=extras_filetype:organisation&q="DE" also returns files with the word 'description' in the returned data which we don't want. More soon.....

matmaxgeds commented 6 years ago

Hi Andy - got a pull request coming for this coming from @kndm - a programmer I work with - we have done a few quick tests of a modified plugin and think it works well e.g. "bmz" now picks up the Germany file, "GB" returns all the orgs based in the UK etc, without being information overload or too many false positives. But you might decide that it is better with a narrower search field, or perhaps some other way.....let us know

andylolz commented 6 years ago

Oh, nice! Looks cool – I’ll test it out very shortly.

andylolz commented 6 years ago

Fixed in #96.

matmaxgeds commented 6 years ago

@andylolz @kndm - in 1.3.1 a search for 'asdb' isn't bringing up the Asian Development Bank file....i.e. https://iatiregistry.org/api/3/action/package_show?id=asdb-org - which includes the string 'asdb' several times - have we missed a search field?

andylolz commented 6 years ago

Just checked, and this was the case at 1.3.0 too – so (thankfully!) unrelated to that change.

organization_name should work here, since that exactly equals “asdb” in this case.

https://github.com/pwyf/iati-decipher/blob/2d964532a563d422f197219ec421697ec7798a6b/src/js/popup.js#L17

I’m not sure why that isn’t working! I was a bit suspicious of the underscore separator before merging this PR, but I tested it and it did seem to be doing the right thing. So I’m at a loss, I’m afraid!

It might be worth us checking with CKAN developers (or even on a solr mailing list) to find out the best search string here, since this is not an IATI-specific problem.

kndm commented 6 years ago

I'm not sure if it is related to this string in particular but the dash ("-") seems to be an issue in some searches, i.e the following query:

https://iatiregistry.org/api/3/action/package_search?fq=extras_filetype:organisation&q=name:asdb Returns nothing

https://iatiregistry.org/api/3/action/package_search?fq=extras_filetype:organisation&q=name:asdb Returns the expected result by using the wildcard operator () which is strange because the first query should already be listing it as a result.

El mar., 27 nov. 2018 a las 17:49, Andy Lulham (notifications@github.com) escribió:

Just checked, and this was the case at 1.3.0 too – so (thankfully!) unrelated to that change.

It might be worth us checking with CKAN developers (or even on a solr mailing list) to find out the best search string here, since this is not IATI-specific.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pwyf/iati-decipher/issues/93#issuecomment-441975763, or mute the thread https://github.com/notifications/unsubscribe-auth/AF35l8PlO0vJC4aStDDTV6hF8fCahyHZks5uzPyPgaJpZM4YfkaR .

kndm commented 6 years ago

Upon further inspection it seems some of the fields are not mapped properly, i.e organization_name may not be the name of the field for organization -> name (key)

@andylolz do you happen to have any leads on to where I could better see documentation for these fields returned? :)

andylolz commented 6 years ago

@kndm I added a couple of links above:

The CKAN docs are here: https://docs.ckan.org/en/2.8/api/index.html The solr docs are here: https://lucene.apache.org/solr/guide/7_3/common-query-parameters.html

But I think you’d be better off asking a solr mailing list, or possibly a CKAN mailing list.

If you’re happy to keep looking into this, that would be great – I’ll be happy to review and merge a PR.

andylolz commented 5 years ago

Reopening this, since it still needs work (though thanks for the improvements so far, @kndm and @matmaxgeds!)

andylolz commented 5 years ago

This was raised again recently by two users separately, both times with the specific example of AfDB / African Development Bank. The former search works, the latter doesn’t (despite it being listed as the organisation name).

andylolz commented 5 years ago

The CKAN-dev mailing list page suggests searching the archive via: https://www.google.com/search?q=%22%5Bckan-dev%5D%22+site%3Alists.okfn.org

So e.g. this or perhaps this.

There’s plenty of reading material there – I’d bet the answer lies within!

matmaxgeds commented 5 years ago

@kndm happy for you to skip a bit of Somalia work to have another look at this - maybe you also got a reply to your post on the CKAN forums?

andylolz commented 5 years ago

I’ve posted the following: https://lists.okfn.org/pipermail/ckan-dev/2018-December/023005.html

Does that look okay?

Fingers crossed for a response!

kndm commented 5 years ago

Thanks for the very detailed info Andy, I will be looking at this (and the mailing list, though I got no response) in the following hours!

On Tue, 18 Dec 2018 at 2:50 AM Andy Lulham notifications@github.com wrote:

I’ve posted the following: https://lists.okfn.org/pipermail/ckan-dev/2018-December/023005.html

Does that look okay?

Fingers crossed for a response!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pwyf/iati-decipher/issues/93#issuecomment-447935630, or mute the thread https://github.com/notifications/unsubscribe-auth/AF35l2lyODPdzPSF3vMNcEWfIBjFUf0Yks5u59ltgaJpZM4YfkaR .

andylolz commented 5 years ago

Okay, so it seems like the answer is: this isn’t possible without changes to the registry API :(

matmaxgeds commented 5 years ago

Ooof/thanks for the detective work - is that something we can request changes to, as from my unenlightened position it is hard to understand why the org title field can't be queried? I guess the alternative is to download all the org files ourselves which isn't particularly appealing.

andylolz commented 5 years ago

from my unenlightened position it is hard to understand why the org title field can't be queried

Yep, same.

download all the org files ourselves which isn't particularly appealing

That would work, but I’m really not keen to do it because I think the registry API should be able to handle it. I’ve raised a ticket on the registry github (IATI/ckanext-iati#226), asking about the possibility of a plugin.