tenders-exposed / elvis-backend-node

Search and visualize public procurements for EU countries http://tenders.exposed/
MIT License
5 stars 2 forks source link

Deal with old CPVs #110

Open georgiana-b opened 5 years ago

georgiana-b commented 5 years ago

We are starting to get old cpvs (from the 2003 CPV standard) in Opentender data. Currently these are causing an error in Elvis because we don't have any way to deal with them.

The solution we proposed so far was to create a new field for cpvs called standardCPV and match on that. For 2008 on tenders the standardCPV would be equal to their normal CPV. For earlier tender the standardCPV would be their corresponding new CPV according to the correspondence table.

georgiana-b commented 5 years ago

@zufanka The issue with the solution above is that, as you will see in the correspondence table there is no 1-1 mapping between the 2007 and the 2003 cpvs.

The 2007 cpvs are more discrete and therefore some 2003 cpvs have 2 or more new cpvs corresponding to them. For example:

2003 cpv 2003 cpv name 2007 cpv 2007 cpv name
23100000-8 Refined petroleum products. 09100000-0 Fuels.
    09210000-4 Lubricating preparations.
    09220000-7 Petroleum jelly, waxes and special spirits.

From a technical pov the right way would be to make this new standardCPV colum an array containing all the corresponding new cpvs. After all, there is no easy way to know which of the new cpvs is the right one for the tender. To continue with the example above it would mean a tender with the 2003 Refined petroleum products cpv would appear in the results for all the 2007 cpvs (Fuels, Lubricating preparations, Petroleum jelly, waxes and special spirits). This could lead to awkward situations where a user that selects Fuels gets results for petrolium jelly as well.

Having it as an array would also make things even more complicated and most importantly slow. If you have any other idea of how to handle this please bring it forward.

zufanka commented 5 years ago

@georgiana-b : I think it's ok to just take the most general one, in this case '09100000'. We have no way of knowing which more specific one to use anyway and as you pointed out, another solution would be awkward.

georgiana-b commented 5 years ago

@zufanka I like the idea but unfortunately it's not applicable to all the cpvs. The one I presented was just a fortunate example but some correspondence looks like this:

2003 cpv 2003 cpv name 2007 cpv 2007 cpv name
23110000-1 Light and medium oils and derivate products. 09130000-9 Petroleum and distillates.
    09210000-4 Lubricating preparations.
    09220000-7 Petroleum jelly, waxes and special spirits.
zufanka commented 5 years ago

@georgiana-b I see! I guess then we put them under the highest level CPV of what they have in common, in this case it's the CPV 09000000 - Petroleum products, fuel, electricity and other sources of energy.