hugomflavio / actel

Standardised analysis of acoustic telemetry data from fish moving through receiver arrays
https://hugomflavio.github.io/actel-website
26 stars 6 forks source link

Data does not match to any of the supported hydrophone file formats #103

Closed rihamilan closed 1 month ago

rihamilan commented 2 months ago

Hello Flavio,

Actel looks great, has good features and can help me a lot in processing detections. But I have tried to apply actel to Lotek data and I have not succeeded. The problem is that the Lotek protocol does not contain Signal or CodeSpace columns because the Lotek encoding is different and the output only contains the detected tag map id, timestamp, TOA and sensor information.The raw output from a receiver looks like this:

Date Time TOA Tag ID Type Value Power

04/17/24 03:40:31 0.99708 10049 1470 04/19/24 16:17:43 0.62854 43200 T 12.4 1247 04/19/24 16:18:13 0.62542 43200 P 4.0 1450 04/19/24 16:18:36 0.25875 37100 P 5.0 1419

I tried to assign the tagId column as a signal, but that didn't work. Is there any way to adapt the data to the Lotek output. Thank you for your reply.

All the best Milan

hugomflavio commented 2 months ago

Hi Milan, thanks for reaching out!

Is this the table format you get directly from the Lotek receiver/software, or has it been altered in any way? I am surprised that a codespace is not included in the output. On which frequency/protocol were these tags working?

rihamilan commented 2 months ago

Hi Hugo, Thank you for the very quick reply! this was a direct output you get from the Lotek receivers (the full raw log exported directly from the Lotek software is attached). The tag works on 76 khz with CDMA encoding, but honestly I am not sure what protocol it is exactly as Lotek does not specify it anywhere (tag specification here https://www.lotek.com/products/map-series/). I can try to find our more but anyway the output give only detected mapid so perhaps Signal or CodeSpace columns are not necessary. WHS3K-3010065_20240828_185537.TXT

Milan

hugomflavio commented 2 months ago

The file format is new to me, but it does contain everything we should need to automatically import the data into actel. Let me see if I can make a function to import this raw log directly. I'll use "FSK" as the code space. e.g. "FSK-37900"

rihamilan commented 2 months ago

Great thank you very much!

hugomflavio commented 2 months ago

Does the Lotek software give you possibility to save the dates in a friendlier format? e.g. yyyy-mm-dd?

rihamilan commented 2 months ago

No, there is no way to customise an output. Detection file (Lotek jst format) is converted to txt (the format I sent before) without having the possibility to change the fortmat of the txt.

hugomflavio commented 2 months ago

hm... that complicates things quite a bit! I've also just learned that this type of log is not in UTC, but rather in the timezone the computer that created the log... these things make this kind of log very unconventional :sweat_smile:. The ideal solution here is for Lotek to fix their export system, but in the meantime, I'll work on a function that can help the user convert the logs into a format that plays nicely with everything else.

hugomflavio commented 1 month ago

PR #116 is running against the package tests now. It contains a function to convert those Lotek logs into a more friendly format, which can then be used in the actel analyses.

hugomflavio commented 1 month ago

@rihamilan the new function is now ready for you to try out. To install the latest dev version of actel, follow these instructions:

# If you don't have the package "remotes" installed, start with:
install.packages("remotes")

# This will install actel's development version:
remotes::install_github("hugomflavio/actel"
                        build_opts = c("--no-resave-data", "--no-manual"), 
                        build_vignettes = TRUE)

library("actel")
# The displayed version should now be 1.3.0.9008
# If that is not the case, unload actel and load it again.

Once you have version 1.3.0.9008 running, you can see the help for the new function with ?convertLotekCDMAFile. Essentially, you need to provide it with the file name, the format of the dates, and the timezone of the study area. The function will return your detections as a data.table object, which you can then save as a csv file and use with actel.

Something like this:

x <- convertLotekCDMAFile(file = "WHS3K-3010065_20240828_185537.TXT",
                          date_format = "%m/%d/%y",
                          tz = "Continent/City") # <- change this line
head(x)
write.csv(x, "lotek_log.csv", row.names = FALSE)

You may also want to look into the preload() function, seeing as your detections will already be in R anyway :)

Please let me know if it works for you!

rihamilan commented 1 month ago

Hi Hugo,

thank you very much, that was quick and looks great :). I found a small bug, if there is a NA in the "value" column, the function takes the value from the "power" column. The rest works perfectly. Example here:

Raw txt: Decoded Tag Data: Date Time TOA Tag ID Type Value Power

04/17/24 03:40:31 0.99708 10049 1470 04/19/24 16:17:43 0.62854 43200 T 12.4 1247 04/19/24 16:18:13 0.62542 43200 P 4.0 1450

Obtained csv:

head(x) Timestamp Receiver CodeSpace Signal Sensor.Value Sensor.Unit

1: 2024-04-17 03:40:31 3010043 FSK 10049 NA 1470 2: 2024-04-19 16:17:43 3010043 FSK 43200 12.4 T 3: 2024-04-19 16:18:13 3010043 FSK 43200 4.0 P
hugomflavio commented 1 month ago

hm... I am not quite sure how we could address that one... the columns are split by a variable number of spaces, so data.table seems to just be grabbing any interval of spaces as a column delimiter. Since there's nothing but spaces between the tag number and the power level for that row, data.table figures that must be the next column... I'll discuss with the folks at OTN to see if there's anyway around this.

benjaminhlina commented 1 month ago

Considering the first detection is from a tag that I'm assuming doesn't have as sensor, couldn't you have it still split based on spaces as a column delimiter but write an if statement that checks the Sensor.Value column and if it is NA then it replaces Sensor.Unit with either a NA or R or something to indicate that the tag is sensorless. Does this make sense. I haven't dived deep into this but this is my first stab/thought.

jdpye commented 1 month ago

The Tidyverse package readr:: has the read_fwf() function. Once you tokenize the data file by new line, you could apply read_fwf() to the ones that are dataframes and hopefully it will infer correctly what's going on with those columns. You may have to do some extra magic before passing to read_fwf to drop the ========= rows as well but then the result should label up properly.

credit to @yingniu for pointing out the format, finding the Pandas read_fwf() function and starting me on this learning journey :D

rihamilan commented 1 month ago

Actully I am using this piece of code to trasport data to csv. It is a bit lame and not universal. Number of skipped lines is not always the same and I need to accomodate qoite often but it might help that some small piece might be incorporated to your code and fix the bug.

y <- data.table(read_fwf(file=paste(projectPath, fns.det[i], sep = ""),skip = 47, col_positions = fwf_widths(c(10, 10,10, 13, 10, 10, 8), col_names = c("Date", "Time", "TOA", "Tag_ID", "Type", "Value", "Power")), col_types = cols(Type = col_character(),Value = col_double() ))) receiver_sn <- read_fwf(file=paste(projectPath, fns.det[i], sep = ""),skip = 20,n_max = 1,col_positions = fwf_widths(c(15, 18))) receiver_sn <- substring(receiver_sn$X2, 11,13) node_id_c <- read_fwf(file=paste(projectPath, fns.det[i], sep = ""),skip = 21,n_max = 1,col_positions = fwf_widths(c(15, 20))) y[,TOA := as.numeric(TOA)] y <- y[!is.na(TOA)] y[, timestamp := paste(Date, Time, sep = " ")] y <- y[, .(timestamp, TOA, tag_id = Tag_ID, type = Type, value = Value, power = Power,rec_sn = receiver_sn, node_id = node_id_c$X2 )]

hugomflavio commented 1 month ago

actel already depends on readr, so this could be a nice way to solve the issue. Working on it now. Also, it seems that the date format, unconventional as it is, is not dependent on the computer, which means we should be able to generalize that as well for these txt logs.

@rihamilan I had a meeting with a person from Lotek. I've found that there is a way to extract these detections as csv using their software, it is just buried further in the options. However, doing so has its own set of problems, so this txt file might be our best bet for now.

hugomflavio commented 1 month ago

PR #118 is testing now.

We may be getting closer to a situation where these TXT logs could be automatically imported an processed by one of the three main actel analyses. Some doubts remain before I'm sure that would work, so for now I kept the convertLotekCDMAFile() as a separate function from the rest.

@jdpye thanks for the readr::read_fwf() pointer!

hugomflavio commented 1 month ago

latest updates merged. @rihamilan if you could reinstall the dev version with the code I posted above and let me know if the function works better now, that would be great :) Note that I removed the timezone argument.

rihamilan commented 1 month ago

I really appreciate the effort you put into this topic :). Actually, I had completely forgotten that there is a way to convert the data to csv, since, as you wrote, this is done in a very inconvinient way. The default setting for Lotek is txt.

I have reinstalled the package with the new development version, but it still shows the same version 1.3.0.9008 and the output is the same.

hugomflavio commented 1 month ago

the version still being 1.3.0.9008 is fine (I didn't bump it up again yet). But the output being the same is unexpected... Are you still getting power values assigned to the Sensor.Unit column? When I try the new function on the file you provided, and I look up the rows that have no sensor data, I get:

image

It also works fine if I forcefully remove some data at the start of the file. E.g. this file:

Decoded Tag Data:
Date      Time             TOA       Tag ID    Type     Value     Power
=======================================================================
04/09/24  22:50:03     0.43875        37910                          12
08/21/24  12:45:18     0.99646        55606       M         0         1
08/23/24  15:01:04     0.76042        55778       P       0.0         2

results in:

image

Could you restart your workspace to ensure the latest install of actel is loaded? You can check it by looking for these lines when you run convertLotekCDMAFile :

image

If your function still uses data.table to import the file, it is still the old version.

rihamilan commented 1 month ago

Thanks, it works now! Resetting the workspace helped. And I got this output:

x <- convertLotekCDMAFile(file = "WHS3K-3010043_20240827_070715.TXT",
                            date_format = "%m/%d/%y") 
head(x)

             Timestamp Receiver CodeSpace Signal Sensor.Value Sensor.Unit
                <POSc>    <num>    <char>  <num>        <num>      <char>
1: 2024-04-17 03:40:31  3010043       FSK  10049           NA        <NA>
2: 2024-04-19 16:17:43  3010043       FSK  43200         12.4           T
3: 2024-04-19 16:18:13  3010043       FSK  43200          4.0           P
4: 2024-04-19 16:18:36  3010043       FSK  37100          5.0           P
5: 2024-04-19 16:19:13  3010043       FSK  43200          4.0           P
6: 2024-04-19 16:19:21  3010043       FSK  37100          6.0           P

The interesting thing is that I checked the function and it seems to be using the old version of the code:

> convertLotekCDMAFile
function (file, date_format = "%m/%d/%y") 
{
    file_raw <- readLines(file)
    serial_n <- file_raw[grep("^Serial Number:", file_raw)]
    serial_n <- extractSignals(serial_n)
    code_type <- file_raw[grep("^Code Type:", file_raw)]
    code_type <- sub("Code Type:\\s*", "", code_type)
    if (code_type == "") {
        code_type <- NA
    }
    gmt_cor <- file_raw[grep("^GMT Correction:", file_raw)]
    gmt_cor <- sub("GMT Correction:\\s*", "", gmt_cor)
    gmt_cor <- decimalTime(gmt_cor)
    det_start <- grep("=========", file_raw)[1]
    det_end <- grep("Receiver Sensor Messages:", file_raw)[1] - 
        2
    det_names <- file_raw[det_start - 1]
    det_names <- sub("Tag ID", "Signal", det_names)
    output <- readr::read_fwf(file, skip = det_start, n_max = det_end - 
        det_start, show_col_types = FALSE)
    output <- as.data.table(output)
    colnames(output) <- unlist(strsplit(det_names, "\\s\\s*"))
    output$CodeSpace <- code_type
    output$Receiver <- as.numeric(serial_n)
    output$Date <- as.Date(output$Date, format = date_format)
    output$Timestamp <- paste(output$Date, output$Time)
    output$Signal <- suppressWarnings(as.numeric(output$Signal))
    output <- data.table::setnames(output, c("Type", "Value"), 
        c("Sensor.Unit", "Sensor.Value"))
    std_cols <- c("Timestamp", "Receiver", "CodeSpace", "Signal", 
        "Sensor.Value", "Sensor.Unit")
    output <- output[, std_cols, with = FALSE]
    output$Timestamp <- fasttime::fastPOSIXct(as.character(output$Timestamp), 
        tz = "UTC")
    output$Timestamp <- output$Timestamp - (gmt_cor * 3600)
    if (any(is.na(output$Timestamp))) {
        warning(paste0("Some timestamp values are NA. This must be fixed before these ", 
            "detections are used in an actel analysis."), call. = FALSE)
    }

Although I reinstall the package with the code you sent me and try the same thing on another computer.

hugomflavio commented 1 month ago

I can see the readr::read_fwf there, so you are using the new function :)

Here:

image

So, is this problem solved, for now? If yes, I'll update the vignettes etc and close the issue.

Thanks for bringing it up!

hugomflavio commented 1 month ago

Made some further adjustments to the new function based on another lotek log I received. Included a mention of the new function in the package website. With more testing, this function may become a part of the regular data-import pipeline (i.e. without needing to be called separately). Closing :)