inbo / etn-occurrences

Acoustic telemetry data 🔊🐟
MIT License
7 stars 1 forks source link

Move durif index info to age + age_unit #15

Closed peterdesmet closed 5 years ago

peterdesmet commented 5 years ago

590 animals mention durif_index in comments, e.g.:

; durif_index : 
; durif_index : 100.1274
; durif_index : 100.130445
without calliper ; durif_index : 104.4134   

What is durif_index? Is it important? Can we structure this information better, e.g. by creating an extra field in ETN or by using one of the length fields:

lenght4: 100.1274
lenght4_type: durif_index

Projects with durif_index in the comments:

projectName animals
2011 Rivierprik 39
2012 Leopoldkanaal 104
2013 Albertkanaal 162
2014 Demer 16
2015 Dijle 26
2015 Fint 8
2015 PhD Verhelst 193
Homarus 2
PhD Jan Reubens 40
PieterjanVerhelst commented 5 years ago

This is the life stage index of eels, identified using the protocol by 'Durif et al. (2005) The silvering process of Anguilla anguilla: a new classification from the yellow resident to the silver migrating stage.' Specifically, based on morphometric measurements, we can identify if the tagged eel is a yellow (Durif life stages I, FII, FIII) or silver (stages FIV, FV and MII) eel. As such, this field is only relevant for European eel. Considering the large number of tagged eels, I would advice to add a separate field.

peterdesmet commented 5 years ago

Using a separate field will be necessary, as for a lot of eels all 4 length fields are already used.

peterdesmet commented 5 years ago

Note that there is also an age field that currently doesn't seem to be used. It even has an age_unit which we could set to Durif index...

jreubens commented 5 years ago

I agree with @peterdesmet. You can use this field for the durif index. Durif is limited to only one species, thus it is not recommanded to add extra fields on animal level. @filipWaumans: can you do this on DB level, or should this be done manually?

PieterjanVerhelst commented 5 years ago

Indeed, the age field justifies the means very well.

peterdesmet commented 5 years ago

Ok, please confirm:

PieterjanVerhelst commented 5 years ago

@peterdesmet @jreubens, there is some ambiguity in if we want to change a column name (´age_unit´ into ´Durif index´) or change the levels of a column. What I would suggest, is that we have at least a column with ´life stage´ as this is applicable for many species (e.g. juvenile versus adult; we did this post-data collection for wels catfish for instance). Specifically for eels, we can use the durif life stages (I, FII, FIII, FIV, FV and MII as vallid life stages). Next, the headers ´age´ and ´age_unit´ could be filled in with the numeric Durif index (e.g. ´100.69925´) and ´durif index´ (no capital, consistency throughout the format) respectively, to indicate the number in the ´age´.

An example:

life stage age age unit
FV 137.72662 durif index
peterdesmet commented 5 years ago

Great, aligns with what I was thinking (i.e. not changing the column names).

jreubens commented 5 years ago

@fwaumans thus here we need to foresee an extra metadata field 'life stage'. Can you provide this

peterdesmet commented 5 years ago

Not needed: the field life_stage is already available in the database.

peterdesmet commented 5 years ago

But no directly available in the overview table or download. I have created a separate issue for this: #22.

peterdesmet commented 5 years ago

@fwaumans, actions to be taken in database:

  1. Filter on COMMENTS containing durif
  2. Verify AGE and AGE_UNIT are not yet populated
  3. Use regex to extract durif index value (a double) and place in AGE
  4. Use regex to remove durif comment from COMMENTS
  5. Set AGE_UNIT to durif index for non-empty AGE

Or - easier - you can use durif.txt with the new values that I created with the following R code:

mutate(
    age = str_extract(comments, "durif_index : [0-9.A-Z]+"),
    age = str_remove(age, "durif_index : "),
    age = na_if(age, ""),
    age = as.double(age),
    comments_clean = str_remove(comments, "[;\\s]*durif_index\\s:[\\s]*[0-9.A-Z]*\\s*"),
    comments_clean = na_if(comments_clean, ""),
    age_unit = case_when(
        !is.na(age) ~ "durif index",
        TRUE ~ NA_character_
    )
)

And use the columns comments_clean, age, and age_unit. Do check though that the fields AGE and AGE_UNIT are not used yet.

peterdesmet commented 5 years ago

@PieterjanVerhelst @jreubens fyi, here's how (a sample of) the clean data looks like. Note that life_stage was already populated:

comments comments_clean age age_unit life_stage
; durif_index : 135.69285 135.69285 durif index FV 1
; durif_index : 84.802235 84.802235 durif index FIV 1
; durif_index : 92.167155 92.167155 durif index FIV 1
; durif_index : 92.5504 92.5504 durif index FIII 1
; durif_index : 97.5762 97.5762 durif index FIII 1
; durif_index : 97.85408 97.85408 durif index FIII 1
; durif_index : 99.18314 99.18314 durif index FIII 1
1 draadje los (nog 1tje over) ; durif_index : 1 draadje los (nog 1tje over) 1
Blind at one side (see photo) ; durif_index : Blind at one side (see photo) 1
durif_index : 100.094 100.094 durif index FIII 1
durif_index : 100.339 100.339 durif index FIV 1
durif_index : 101.23166 101.23166 durif index FIII 1
durif_index : 103.852 103.852 durif index FIII 1

"durif_index : 104.333 Although FIII, had clear silver eel morphology" | Although FIII, had clear silver eel morphology | 104.333 | durif index | FIII | 1 durif_index : 104.6 | | 104.6 | durif index | FIV | 1 durif_index : 105.161 | | 105.161 | durif index | FIII | 1 durif_index : 105.35 | | 105.35 | durif index | FIII | 1 durif_index : 106.21458 | | 106.21458 | durif index | FIII | 1 durif_index : 99.816 | | 99.816 | durif index | FIII | 1 durif_index : 99.999 | | 99.999 | durif index | FIII | 1 ging niet meteen onder bij uitzetten ; durif_index : | ging niet meteen onder bij uitzetten | | | | 1 Moved during surgery ; durif_index : 100.45528 | Moved during surgery | 100.45528 | durif index | FIII | 1

fwaumans commented 5 years ago

The durif index is completed in the age and age_unit field. The comment field is cleaned

peterdesmet commented 5 years ago

Nice!

LienReyserhove commented 5 years ago

Just to check, but based on this discussion and the data I assume that life_stage and age_units are linked as they are refering to the durif index. Based on that I would assume that each life stage has a value for age_units, which is durif index. However, 159 records have a life stage but no value for age_units. Is this an error or am I missing something?

peterdesmet commented 5 years ago

What are those values without age_unit?

LienReyserhove commented 5 years ago

life stages FIII, FIV and FV

peterdesmet commented 5 years ago

That is ok, these do not have an age unit.

peterdesmet commented 5 years ago

My mistake: lifeStage does not have a unit. Sometimes an age is identified. These should always have an age_unit.

LienReyserhove commented 5 years ago

yes, ok, age always has an age_unit