open-contracting / cardinal-rs

Measure red flags and procurement indicators using OCDS data
MIT License
9 stars 3 forks source link

indicators: add R044 Business similarities between suppliers (or bidders): common addresses, personnel, phone numbers, etc. #94

Open yolile opened 3 months ago

yolile commented 3 months ago

Methodology

Required OCDS fields: parties/roles IN 'supplier' OR 'tenderer', parties/identifier/id, (parties/contactPoint/telephone OR parties/address/streetAddress OR parties/address/postalCode OR parties/contactPoint/name OR parties/contactPoint/email)

Calculation method:

For suppliers k,j bidding in the same procedure i , flag if the procedure if the bidders have the same address (or phone number, contact point, email, etc):

R044i=1 if

parties/address/streetAddressk,i=parties/address/streetAddressj,i

yolile commented 3 months ago

Ecuador publishes:

For address, I guess we want to compare, country, locality region and street address all together and not street address alone.

And do we want to calculate this for bidders in the same process only or in general?

@Camilamila @jpmckinney

jpmckinney commented 3 months ago

Based on https://colab.research.google.com/drive/1q38GlyG7B_uPCsqaFBt1UvT5FnNvEtbM#scrollTo=yg8SFe-09kvD I think this indicator is within the same process only, but I haven't compared to the methodologies in the academic sources.

I think it makes sense to combine fields into a full address, yes.

We might discover that we need to do some normalization (e.g. normalize whitespace, lowercase, maybe normalize punctuation). There's more that can be done #33, but I think we'll limit to basics for now.

yolile commented 3 months ago

I think this indicator is within the same process only

True, you are right, because this one is related to detecting collusion. The example that the notebook refers to, however, is Control Ciudadano from Paraguay, and there, we did the exercise with all the bidders, not depending on whether they were bidding on the same process or not. But for this indicator, we can implement the original and documented methodology that is for the same process only.

yolile commented 3 months ago

We might discover that we need to do some normalization (e.g. normalize whitespace, lowercase, maybe normalize punctuation). There's more that can be done https://github.com/open-contracting/cardinal-rs/issues/33, but I think we'll limit to basics for now.

I tested and even without any normalization and with exact match comparison I got a lot of matching in Ecuador's data (at least comparing all bidders no matter the ocid)

yolile commented 3 months ago

What should be the output of the indicator? Besides flagging the OCID, do we want to output the matching bidders along with why they are similar?

jpmckinney commented 3 months ago

We would flag the bidders like in R024, etc. (using set_result! and set_tenderer_map!). We can give a score of 1.0 for exact match. I don't think we have any way to add additional metadata about why they are similar.

yolile commented 2 months ago

I don't think we have any way to add additional metadata about why they are similar.

But should we?

jpmckinney commented 2 months ago

Maybe open a new issue with R044 as an example, since we could add more metadata to any indicator. Right now we don’t have any user research telling us that users want more metadata.