younginnovations / resourcecontracts-api

The API component of resourcecontracts
MIT License
2 stars 1 forks source link

company_name search does not always return results #7

Open jerico opened 8 years ago

jerico commented 8 years ago

I'm listing companies from company_name key. Other company name work, some don't.

If I try "company_name=OceanaGold (Philippines), Inc. (Contractor/Operator)", it does not return any result. It should return 1 contract associated with that company.

http://api.resourcecontracts.org/contracts/search?from=0&per_page=1000&group=metadata&country=ph&company_name=OceanaGold%20(Philippines),%20Inc.%20(Contractor%2FOperator)&

anderspeders commented 8 years ago

@anjesh Can you follow up here?

anjesh commented 8 years ago

I think this has to do with the punctuation characters. We need to explore a bit on the elasticsearch indexing features. I remove comma from one of the companies http://contracts.ph-eiti.org/contract/101 and now it's appearing in the results. http://contracts.ph-eiti.org/search?q=&year=&resource=&company_name=Adnama%20Mining%20Resources%20Incorporated

anderspeders commented 8 years ago

also - would it make sense to push this repo to NRGI/RC as well?

anjesh commented 8 years ago

Yes definitely. I do have plans to move this and 2 more repo (subsite and elasticsearch) to NRGI github.

anderspeders commented 8 years ago

Great, thanks.

On Mon, Oct 19, 2015 at 12:50 PM, Anjesh notifications@github.com wrote:

Yes definitely. I do have plans to move this and 2 more repo (subsite and elasticsearch) to NRGI github.

— Reply to this email directly or view it on GitHub https://github.com/younginnovations/resourcecontracts-api/issues/7#issuecomment-149278070 .

anjesh commented 8 years ago

Presence of comma in the company name is barring the contracts from appearing in the results. One thing we could do here is remove the comma from the company name, as you may see the result appears when the comma is removed.

image

image

However the system doesn't differentiate the supporting contracts from the main contracts as of now (except that there's relationship), so when the company names are returned from the API, it gives all the company names including the ones from the supporting ones as well. And I see that Jerico has hidden all the supporting contracts, perhaps using text pattern "annex?" search. If the company name in supporting documents is different from principal document, then we will see the company name whereas it won't display any results.

@charlesyoung

charlesyoung commented 8 years ago

Yep is a comma issue, same with Forum Exploration, Incorporated.

There shouldn't a comma in the name so can we build some form of validation when capturing the company name in the admin module?

I will for now update the contracts and annex's linked with a company that has a comma in the name like I just did for Far Southeast Gold Resources Incorporated.

charlesyoung commented 8 years ago

Busy updating, found more issues.

Another issue is that the system crashes when the name includes a bracket (OceanaGold (Philippines), Incorporated - FTAA No. 001, 1994). I have asked Jerico to speak to Joy to update.

Also doesn't support hyphens (Rapu-Rapu Minerals, Incorporated - MPSA No. 163-2000-V, 2000) which is a problem because in this example the company name needs a hyphen.

charlesyoung commented 8 years ago

Scrape above, was related to the site going down.

charlesyoung commented 8 years ago

Interesting that the contract name isn't updated when I remove the comma.

image