ices-tools-dev / esas

European Seabirds at Sea (ESAS) data model
https://esas-docs.ices.dk
Creative Commons Zero v1.0 Universal
3 stars 1 forks source link

Review trips: platform_code (now platform_name) #17

Closed peterdesmet closed 2 years ago

peterdesmet commented 3 years ago

trips: platform_code indicates the platform code (ship call sign, etc) with an integer code:

https://github.com/ices-tools-dev/esas/blob/0d294b4ea678e34f52409a16602f9133ac179dcf/_data/table-schemas/trips.yaml#L119-L130

Values are available at here.

peterdesmet commented 3 years ago

The first question is answered in https://github.com/ices-tools-dev/esas/issues/12#issuecomment-737909821, where it is suggested to use ShipC. I will investigate how easy it is to map these.

peterdesmet commented 3 years ago

Summary of mapping:

Questions:

HjalteParner commented 3 years ago

@peterdesmet you should use ICES platform code as the identifier/key in your system and only use an interger virtual key internally in your database, if you find that useful. The platform name is not unique cannot as such be used as an identifier of a given platform. The ICES platform request system is an international colaboration used globally. As seen at http://vocab.ices.dk/request, you can contact accessions@ices.dk if you want access to the platform request application where you can search for and request new codes for platform not allready in the system. All you need to do to request a code for a new platform is to provide enough metadata to identify the platform uniquely. Then a data manager will validate your information and assign a code. The code will be the key for all future references across the globe for the platform in question. No need to reinvent the whell here.

neil-ices-dk commented 3 years ago

@ices-tools-dev/data-and-information I notice many duplicate values in the SHIPC list, such as 14AT, 32A7, 572N, 90A7, CUAN for Antares. How should we differentiate between those?

just to clarify, these are not duplicates; the governance model distinguishes between instances of a vessel/hull. So although the name is not unique, the combination of key attributes will be - in most cases the commission date/decommission date are the defining instances of a vessel (platform code) with the same name/call sign

nicolasvanermen commented 3 years ago

Quite unlikely I am afraid... I have received (some) lookup tables from JNCC, and in case of platform_code the missing values are simply not there. Not sure how this is possible, but the actual original ESAS database is somehow locked and Mark Lewis can't reach it. This should be solved soon, and maybe there is some additional information there, but I would not count on it.

Nicolas Vanermen Wetenschappelijk medewerker Instituut voor Natuur- en Bosonderzoek Havenlaan 88, bus 73, 1000 Brussel 0486/361.582

On Wed, Jan 6, 2021 at 12:26 PM Peter Desmet notifications@github.com wrote:

trips: platform_code indicates the platform code (ship call sign, etc) with an integer code:

https://github.com/ices-tools-dev/esas/blob/0d294b4ea678e34f52409a16602f9133ac179dcf/_data/table-schemas/trips.yaml#L119-L130

Values are available at here https://github.com/ices-tools-dev/esas/blob/main/_data/vocabularies/platform_code.tsv .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ices-tools-dev/esas/issues/17, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYKGXN6ZFZBG2VMY5SK4G3SYRCFVANCNFSM4VXM3QSA .

peterdesmet commented 3 years ago

@HjalteParner @neil-ices-dk thanks, then the core issue is: given that we a historic database that only contains platform/ship names such as Alkor, Alluchio, Alsfeld (BG 16) and no extra metadata (except maybe the country), how do we map these to ShipC codes?

nicolasvanermen commented 3 years ago

In case of ter streep, the code 2465 is not used in the database. The same goes for other obvious duplicates such as prins filip and prins albert. Can you supply me with a list of duplicates?

Nicolas Vanermen Wetenschappelijk medewerker Instituut voor Natuur- en Bosonderzoek Havenlaan 88, bus 73, 1000 Brussel 0486/361.582

On Wed, Jan 6, 2021 at 6:42 PM Peter Desmet notifications@github.com wrote:

Summary of mapping:

Questions:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ices-tools-dev/esas/issues/17#issuecomment-755451907, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYKGXLVSOFLUXEVCFFH2I3SYSOJ5ANCNFSM4VXM3QSA .

neil-ices-dk commented 3 years ago

we've actually done a similar exercise for our trawl survey datasets (@Osanna123) will remember this fondly; in that case we primarily looked at the date ranges of the data that related to the vessel name to map it to probable instance(s) of the vessel in the platform codes.

Osanna123 commented 3 years ago
  • [ ] @ices-tools-dev/data-and-information I notice many values for UNKNOWN (like we have too). How can we assign these correctly?

'UNKNOWN' platforms can be mapped to the AAxx codes or the ZZ99 in the SHIPC list.

  • [ ] @ices-tools-dev/data-and-information How would we add the 315 unmapped ships? How are codes assigned?

This would have to be a separate exercise, any additional info would be useful, at least years with data Please note that codes mapped by name should be also verified as many platforms can bear same name.

peterdesmet commented 3 years ago

Discussed this with @nicolasvanermen and @ericstienen. Decided to make this a purely informal field named platform_name with the name of the ship or aviation call sign (see c6c2e0f). It will not be mapped to ShipC, because:

@nicolasvanermen in your export:

Osanna123 commented 3 years ago

The field is not mandatory, but if the field is to be 'informal', it's better to move it to notes. If the platforms are to remain as the field in the format, they should be linked to the controlled vocabulary SHIPC. Considering that platform mapping (and creating missing platforms) is time-consuming, old data could be mapped to the AA-codes in the SHIPC, with the more specific information like name/call sign moved to the notes. The future data submissions can report the exact platform reference.

Osanna123 commented 3 years ago

List of AA-codes: SHIPC code Description AA00 UNSPECIFIED PLATFORM AA11 UNSPECIFIED Fixed benthic node AA12 UNSPECIFIED Sea bed vehicle AA13 UNSPECIFIED BEACH/INTERTIDAL ZONE STRUCTURE AA14 UNSPECIFIED LAND/ONSHORE STRUCTURE AA15 UNSPECIFIED LAND/ONSHORE VEHICLE AA16 UNSPECIFIED OFFSHORE STRUCTURE AA17 UNSPECIFIED COASTAL STRUCTURE AA18 UNSPECIFIED River station AA20 UNSPECIFIED Submersible AA21 UNSPECIFIED Propelled manned submersible AA22 UNSPECIFIED Propelled unmanned submersible AA23 UNSPECIFIED Towed unmanned submersible AA24 UNSPECIFIED Drifting manned submersible AA25 UNSPECIFIED Drifting manned submersible AA26 UNSPECIFIED Lowered unmanned submersible AA30 UNSPECIFIED SHIP AA31 UNSPECIFIED RESEARCH VESSEL AA32 UNSPECIFIED VESSEL OF OPPORTUNITY AA33 UNSPECIFIED SELF-PROPELLED SMALL BOAT AA34 UNSPECIFIED Vessel at fixed position AA35 UNSPECIFIED VESSEL OF OPPORTUNITY ON FIXED ROUTE AA36 UNSPECIFIED FISHING VESSEL AA39 UNSPECIFIED NAVAL VESSEL AA3A UNSPECIFIED MAN-POWERED SMALL BOAT AA41 UNSPECIFIED MOORED SURFACE BUOY AA42 UNSPECIFIED DRIFTING SURFACE FLOAT AA46 UNSPECIFIED DRIFTING SUBSURFACE PROFILING FLOAT AA61 UNSPECIFIED RESEARCH AEROPLANE AA67 UNSPECIFIED HELICOPTER AA71 UNSPECIFIED HUMAN AA72 UNSPECIFIED DIVER AA95 UNSPECIFIED amphibious vehicle self-propelled

peterdesmet commented 3 years ago

I have now reviewed the whole mapping list. To be discussed what is the best approach to move forward.

peterdesmet commented 3 years ago

Decisions after May 11 meeting with @Osanna123 and @nicolasvanermen:

peterdesmet commented 3 years ago

Some codes that are not yet mapped (do not have ok) do seem to be used a lot or recently, making them valuable candidates to be added to SHIPC. Here are the number of those codes based on the use threshold (numbers created withwith OR, e.g. >=2000 OR >=20 trips):

 threshold  no filter >=20 trips  >=50 trips >=100 trips
 no filter 281 95 47 28
>=2000 95 145 114 103
>=2005 89 130 93 79
>=2010 39 119 76 59
>=2015  15  108 62 43

@Osanna123 without truly knowing how much work is involved, >=2010 and >100 trips seem reasonable: 59 codes to add

nicolasvanermen commented 3 years ago

OK, thanks! But I don't really get how the 'no filter' row and columns have less ships compared to their filtered analogues.

Nicolas Vanermen Wetenschappelijk medewerker Instituut voor Natuur- en Bosonderzoek Havenlaan 88, bus 73, 1000 Brussel 0486/361.582

On Tue, May 11, 2021 at 1:49 PM Peter Desmet @.***> wrote:

Some codes that are not yet mapped (do not have ok) do seem to be used a lot or recently, making them valuable candidates to be added to SHIPC. Here are the number of those codes based on the use threshold (numbers created withwith OR, e.g. >=2000 OR >=20 trips): threshold no filter >=20 trips >=50 trips >=100 trips no filter 281 95 47 28 >=2000 95 145 114 103 >=2005 89 130 93 79 >=2010 39 119 76 59 >=2015 15 108 62 43

@Osanna123 https://github.com/Osanna123 without truly knowing how much work is involved, >=2010 and >100 trips seem reasonable: 59 codes to add

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ices-tools-dev/esas/issues/17#issuecomment-838333912, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYKGXKCN4JGGWOU3HBVWSLTNEKTHANCNFSM4VXM3QSA .

peterdesmet commented 3 years ago

"No filter" is maybe a confusing term, but read >=2000 and no filter as 95 trips after 2000, while >=2000 and >=20 trip should be interpreted as 145 ship codes after 2000 or with more than 20 trips. Anyway, the important point to decide is how many SHIPC codes are reasonable to add.

peterdesmet commented 2 years ago

PlatformCode implemented.