isoverse / isoreader

Read IRMS (Isotope Ratio Mass Spectrometry) data files into R
http://isoreader.isoverse.org
GNU General Public License v2.0
8 stars 6 forks source link

iso_read_scn #36

Closed sebkopf closed 4 years ago

sebkopf commented 5 years ago

implement reader for isodat's scn file format (see #25 for similar request)

japhir commented 5 years ago

Do you already have an intended output format in mind for this? Basically each .scn file holds:

We usually combine 5 scans going from 25V down to 5V, but they all produce different files, so working with that would be isoprocessor/clumpedr's job.

I'd like to have a target format in mind, so that I can already put some of our excel sheets (shudder) in the right format and work on the processing part without worrying that I'll have to redo everything once we come up with a format for this and it turns out different from what our excel method produced.

The format could be in wide form:

| file_id | file_name | file_directory | acc_voltage | m44.mV | m45.mV | m46.mV | m47.mV | m48.mV | m49.mV | m54.mV |

or maybe in long form:

| file_id | file_name | file_directory | acc_voltage | mass | intensity |

What do you think?

sebkopf commented 5 years ago

good question. The format is similar to the continuous flow files with file_info holding the usual information and raw_data the actual raw data so you can pull it all out together with iso_get_raw_data(include_file_info = ...) and can pull into long format (if so desired) using the gather = TRUE parameter of iso_get_raw_data().

The part I'm struggling with is that x-axis is not necessarily acceleration voltage in scan files. It can be a bunch of different things including acc voltage, magnet steps, time, etc. somehow need to cover all use cases and not sure whether the raw data should be just have a generic x-column that holds that parameter (plus of course the usual m44.mV | m45.mV, etc.) and then the file_info specifies what x-column actually is (in your case "accelerating voltage"). what do you think?

so basically:

file_info = file_id | file_datetime | etc.... | scan_type = ("magnet", "time", "voltage") raw_data = scan_x | m44.mV | m45.mV | etc.

btw. could you provide a few example files for your typical scan file output? we don't actually use acc. voltage much in our continuous flow applications (usually just time and magnet scans)

@brettdavidheiser: what's your thinking on this? + could you also provide a few example files to double check we got the structure resolved across isodat versions?

japhir commented 5 years ago

Thanks, I'll start putting our excel-stuff in this format and then work from there.

The part I'm struggling with is that x-axis is not necessarily acceleration voltage in scan files. It can be a bunch of different things including acc voltage, magnet steps, time, etc. somehow need to cover all use cases and not sure whether the raw data should be just have a generic x-column that holds that parameter (plus of course the usual m44.mV | m45.mV, etc.) and then the file_info specifies what x-column actually is (in your case "accelerating voltage"). what do you think?

Ah of course, didn't think about other scan types. So they all save as .scn files and there is no metadata telling you what's on the x-column within the files? I'm not sure, I think we do use the other types of scans every now and then to monitor for leaks etc., but we don't need them for dataprocessing.

btw. could you provide a few example files for your typical scan file output? we don't actually use acc. voltage much in our continuous flow applications (usually just time and magnet scans)

Here are two days worth of scans. We scan with different acceleration voltages, ranging from 25 V to 5 V across a specific HV range to monitor negative background artefacts.

backgrounds_UU.zip

japhir commented 4 years ago

Hey @sebkopf, any updates on this? Anything I can contribute now that I understand isoreader a little bit better?

I've been getting our excel file data into R with some nice map and excel_sheets and read_excel calls (just 8 lines!) but I'd rather not rely on our excel sheets, especially since I have some measurements from another laboratory for which I only have the .scn files.

I'm back from a month's holiday and have just had my first paper accepted, so I can pick up some clumpedr stuff again now :).

japhir commented 4 years ago

Ok so it turns out we didn't save the excel output for some of the early 2018 runs, when we were using Easotope for a short while. This means I either have to copy-paste the .scn manually into excel, then get those into R, or I should put some effort into getting this working. I've attempted the latter, but it's really quite a bit above my level of understanding. I've installed wxHexEditor to have a look at some example scan files, but honestly I have no idea what I'm doing.

So I tried to get a single binary file into R using your helper functions, to see if some of your general isodat parse functions would give me any luck. It looks like it recognises some blocks that may have data, but I'm not sure how to access it. I think I could probably figure out some of the higher level stuff myself (setting up a new allowed format scan, registering it, setting correct warning messages etc.) but I don't know how to interface with the raw data part of the process. Could you point me in the right direction if you have time?

I've also tried to run the old sebkopf/isoread code, which still appears to work for the scan data.

Here are some new scans: 191212_BG.zip

library(dplyr)
library(isoreader)
file_path <- "~/SurfDrive/PhD/programming/pressure_baseline/dat/191212_BG10V.scn"
# this happens in  make_iso_file_data_structure()
struct <- isoreader:::make_iso_file_data_structure(file_path)
class(struct) <- c("scan", class(struct))
ds <- struct
# ds is a data structure that we want to fill in, so we have to create that first.
# for now we just manually set some options and hope for the best:
ds$read_options$file_info <- TRUE
ds$read_options$raw_data <- TRUE
ds$file_info$file_id <- "191212_BG10V.scn"
ds$file_info$file_root <- "~/SurfDrive/PhD/programming/pressure_baseline"
ds$file_info$file_path <- "dat/191212_BG10V.scn"
ds$file_datetime <- lubridate::today() %>% as.POSIXct() %>% as.integer()

# test if this works
isoreader:::get_ds_file_path(ds)
# then read the binary file
ds$binary <- isoreader:::get_ds_file_path(ds) %>% isoreader:::read_binary_file()

# It looks like we can work with these blocks:
ds$binary$C_blocks$block
#>  [1] "CScanStorage"                 "CBinary"                      "CPlotInfo"                   
#>  [4] "CTraceInfo"                   "CTraceInfoEntry"              "CPlotRange"                  
#>  [7] "CScaleHvScanPart"             "CScaleHvHardwarePart"         "CFinniganInterface"          
#> [10] "CVisualisationData"           "CIntegrationUnitScanPart"     "CIntegrationUnitHardwarePart"
#> [13] "CIntegrationUnitGasConfPart"  "CChannelGasConfPart"          "CCalibration"                
#> [16] "CCalibrationPoint"            "CCupHardwarePart"             "CBasicInterface"             
#> [19] "CChannelHardwarePart"         "CGasConfiguration"            "CBasicScan"                  
#> [22] "CGpibInterface"               "CBlockData"                   "CDioTransferPart"            
#> [25] "CPeakCenterOffset"            "CMagnetCurrentTransferPart"   "CScaleHvTransferPart"        
#> [28] "CCalculatingDacTransferPart"  "CMolecule"                   

# does this mean it doesn't have the datetime info?
isoreader:::exec_func_with_error_catch(isoreader:::extract_isodat_datetime, ds)
# fails with warning that "CTimeObject" doesn't exist

# have a look at what it does
isoreader:::extract_isodat_datetime

# try to copy what the above function does
ds$binary %>%
isoreader:::set_binary_file_error_prefix("cannot identify scan data") %>%
isoreader:::move_to_C_block("CBinary") %>%
isoreader:::move_to_next_C_block_range("CTraceInfoEntry", "CPlotRange")

# raw data
# let's just try all the extract functions
isoreader:::extract_did_raw_voltage_data(ds)
isoreader:::extract_caf_raw_voltage_data(ds)
isoreader:::extract_cf_raw_voltage_data(ds)
isoreader:::extract_dxf_raw_voltage_data(ds)
isoreader:::extract_isodat_datetime(ds)
# none work

# at this point I'm just randomly trying out functions
ds$binary$raw %>% isoreader:::parse_raw_data(type ="double")
# this gets me a single vector of numeric data!

# ok so let's have another look at the old isoread function
scn <- isoread::isoread(file_path)
#> Reading file ~/SurfDrive/PhD/programming/pressure_baseline/dat/191212_BG10V.scn
#> Warning: file creation date could not be determined on this operating system (unix/Linux), 
#> recovering 'last modified date' instead.

scn
#> Showing summary of IsodatScanFile 
#> 
#> Binary File information:
#> rawdata number of bytes: 0
#> number of found text keys: 0
#> number of assigned data fields: 1
#> current read position: 31590
#> data information:
#>        Property                                                                         Value
#> 1 File location /home/japhir/SurfDrive/PhD/programming/pressure_baseline/dat/191212_BG10V.scn
#> 2          Date                                                           2019-12-12 17:28:03
#> 3       n_steps                                                                           525
#> 
#> 
#> Mass data (first 5 rows):
#>    step   mass44    mass45     mass46      mass47    mass54    mass48    mass49
#> 1 61470 1.674651 0.1733054 -0.8935735   47.589202 -147.2779 -198.8207 -143.3445
#> 2 61472 1.663211 0.2226863 -0.9735991    1.918295 -145.6726 -205.0113 -143.9589
#> 3 61474 1.670837 0.1998939 -0.9584081  -65.796793 -144.8993 -210.6509 -144.6003
#> 4 61476 1.655584 0.1922969 -0.9773967 -108.858128 -144.4642 -217.2027 -148.8764
#> 5 61478 1.647958 0.1809018 -0.9849917 -137.734020 -146.9312 -221.2637 -154.7198
#> 6 61480 1.647958 0.1809018 -0.9963838 -146.028725 -146.0319 -222.0348 -157.2804
sebkopf commented 4 years ago

Hi @japhir ,

My apologies this has taken so long, too much teaching and proposaling took higher priority while isoread still works. Finally working on this in the sk_scn branch and your observations have been very helpful. You are indeed correct that the structure is quite different and for example the date&time is not stored in the .scn files (I had to pull it out from the operating system file information which is much less reliable). Generally tested 3 different scan types - High Voltage scans (your application), magnet scans (full sweeps), and time scans (troubleshooting). If there are other scan types you or @brettdavidheiser can think of, would love to get some more example files.

So far what works is pulling the scan type and any file comments into file_info and all the raw data into raw_data. What should be possible but I have not yet figured out is the resistor information, which seems to be stored differently than in did, caf, cf, dxf files (yay isodat....). Unresolved is how to treat data that has no mass associated (e.g. from Magnet scans where it's just the cups and the masses are changing) so at the moment those are just v2.mV, v3.mV, etc. where the number is the cup instead of mass number.

Plotting functionality will be next in isoprocessor but first gotta do some more testing on the basic read functionality. @japhir can you try some of your files installing from the sk_scn branch?

devtools::install_github("isoverse/isoreader", ref = "sk_scn")

# read example files
library(isoreader)
iso_files <- 
  iso_read_scan(
    iso_get_reader_example("peak_shape_scan_example.scn"),
    iso_get_reader_example("background_scan_example.scn"),
    iso_get_reader_example("full_scan_example.scn"),
    iso_get_reader_example("time_scan_example.scn"),
    read_cache = FALSE
  )

# file info
iso_files %>% iso_get_file_info()

# plotting (manual)
iso_files %>% 
  iso_get_raw_data(include_file_info = type) %>%
  dplyr::mutate(panel = sprintf("%s [%s]", type, units)) %>% 
  tidyr::pivot_longer(
    matches("v\\d+"),
    names_to = "mass",
    values_to = "value",
    values_drop_na = TRUE
  ) %>% 
  ggplot2::ggplot() +
  ggplot2::aes(x, value, color = mass) + 
  ggplot2::geom_line() +
  ggplot2::facet_wrap(~ panel + file_id, scales = "free")
japhir commented 4 years ago

Cool, thanks for taking the time to work on this!

If there are other scan types you or @brettdavidheiser can think of, would love to get some more example files.

I've asked some colleagues and we can't think of any additional scan types. Even for the time scans, for example, we don't save the results. We just use them to monitor things and don't even bother saving the files.

I ran your code chunk, and it looks like it works as expected!

# First run your example code to check if it works on my system as well:
## devtools::install_github("isoverse/isoreader", ref = "sk_scn")

# read example files
library(isoreader)
#> 
#> Attaching package: 'isoreader'
#> The following object is masked from 'package:stats':
#> 
#>     filter
iso_files <-
  iso_read_scan(
    iso_get_reader_example("peak_shape_scan_example.scn"),
    iso_get_reader_example("background_scan_example.scn"),
    iso_get_reader_example("full_scan_example.scn"),
    iso_get_reader_example("time_scan_example.scn"),
      read_cache = FALSE
  )
#> Info: preparing to read 4 data files (all will be cached)...
#> Info: reading file 'peak_shape_scan_example.scn' with '.scn' reader
#> Warning: file creation date could not be determined on this operating syste...
#> Info: reading file 'background_scan_example.scn' with '.scn' reader
#> Warning: file creation date could not be determined on this operating syste...
#> Info: reading file 'full_scan_example.scn' with '.scn' reader
#> Warning: file creation date could not be determined on this operating syste...
#> Info: reading file 'time_scan_example.scn' with '.scn' reader
#> Warning: file creation date could not be determined on this operating syste...
#> Info: finished reading 4 files in 1.10 secs
#> Info: encountered 4 problems in total
#> # A tibble: 4 x 4
#>   file_id           type   func       details                                   
#>   <chr>             <chr>  <chr>      <chr>                                     
#> 1 peak_shape_scan_… warni… get_from_… file creation date could not be determine…
#> 2 background_scan_… warni… get_from_… file creation date could not be determine…
#> 3 full_scan_exampl… warni… get_from_… file creation date could not be determine…
#> 4 time_scan_exampl… warni… get_from_… file creation date could not be determine…

# file info
iso_files %>% iso_get_file_info()
#> Info: aggregating file info from 4 data file(s)
#> # A tibble: 4 x 7
#>   file_id  file_root   file_path  file_subpath file_datetime       type  comment
#>   <chr>    <chr>       <chr>      <chr>        <dttm>              <chr> <chr>  
#> 1 peak_sh… /home/japh… peak_shap… <NA>         2020-02-10 09:53:35 High… test t…
#> 2 backgro… /home/japh… backgroun… <NA>         2020-02-10 09:53:35 High… <NA>   
#> 3 full_sc… /home/japh… full_scan… <NA>         2020-02-10 09:53:35 Magn… <NA>   
#> 4 time_sc… /home/japh… time_scan… <NA>         2020-02-10 09:53:35 Clock <NA>

# plotting (manual)
iso_files %>%
  iso_get_raw_data(include_file_info = type) %>%
  dplyr::mutate(panel = sprintf("%s [%s]", type, units)) %>%
  tidyr::pivot_longer(
           matches("v\\d+"),
           names_to = "mass",
           values_to = "value",
           values_drop_na = TRUE
         ) %>%
  ggplot2::ggplot() +
  ggplot2::aes(x, value, color = mass) +
  ggplot2::geom_line() +
  ggplot2::facet_wrap(~ panel + file_id, scales = "free")
#> Info: aggregating raw data from 4 data file(s), including file info 'type'


# now let's list all the scan files we could potentially read:
# these are from our 253 plus + Kiel III and Kiel IV set-up
files_2018 = list.files(path = "/run/user/1000/gvfs/smb-share:server=geodc01.ad.geo.uu.nl,share=gml/rawdata/253pluskiel/BG Folder",
                        pattern = "\\.scn$", recursive = TRUE, full.names = TRUE)
files_2019 = list.files(path = "/run/user/1000/gvfs/smb-share:server=geodc01.ad.geo.uu.nl,share=gml/rawdata/253pluskiel/BG 2019",
                        pattern = "\\.scn$", recursive = TRUE, full.names = TRUE)

# these are the 253 with a Kiel III
files_pacman_kielIII = list.files(path = "/run/user/1000/gvfs/smb-share:server=geodc01.ad.geo.uu.nl,share=gml/rawdata/Kiel 253/clumped/Scans",
                        pattern = "\\.scn$", recursive = TRUE, full.names = TRUE)
# these are the 253 with a Kiel IV
files_pacman_kielIV = list.files(path = "/run/user/1000/gvfs/smb-share:server=geodc01.ad.geo.uu.nl,share=gml/rawdata/Kiel 253/Background Scans",
                        pattern = "\\.scn$", recursive = TRUE, full.names = TRUE)

# and then juist read/plot 3 of each group of files.
our_isos <- iso_read_scan(c(sample(files_2018, size = 3),
                            sample(files_2019, size = 3),
                            sample(files_pacman_kielIII, size = 3),
                            sample(files_pacman_kielIV, size = 3)))
#> Info: preparing to read 12 data files (all will be cached)...
#> Info: reading file '253pluskiel/BG Folder/13June2018_BG15V.scn' with '.scn'...
#> Warning: file creation date could not be determined on this operating syste...
#> Info: reading file '253pluskiel/BG Folder/23August2018_BG5V.scn' with '.scn...
#> Warning: file creation date could not be determined on this operating syste...
#> Info: reading file '253pluskiel/BG Folder/22June2018_BG25V.scn' with '.scn'...
#> Warning: file creation date could not be determined on this operating syste...
#> Info: reading file '253pluskiel/BG 2019/190807_BG15V.scn' with '.scn' reader
#> Warning: file creation date could not be determined on this operating syste...
#> Info: reading file '253pluskiel/BG 2019/190926_BG15V.scn' with '.scn' reader
#> Warning: file creation date could not be determined on this operating syste...
#> Info: reading file '253pluskiel/BG 2019/190207_BG20V.scn' with '.scn' reader
#> Warning: file creation date could not be determined on this operating syste...
#> Info: reading file 'Kiel 253/clumped/Scans/BG_scans_Anne/160504_5V.scn' wit...
#> Warning: file creation date could not be determined on this operating syste...
#> Warning: caught error - cannot identify scan units - could not find '[re-di...
#> Info: reading file 'Kiel 253/clumped/Scans/BG_scans_Anne/170524_10V.scn' wi...
#> Warning: file creation date could not be determined on this operating syste...
#> Warning: caught error - cannot identify scan units - could not find '[re-di...
#> Info: reading file 'Kiel 253/clumped/Scans/BG_scans_Anne/151210_10V.scn' wi...
#> Warning: file creation date could not be determined on this operating syste...
#> Warning: caught error - cannot identify scan units - could not find '[re-di...
#> Info: reading file 'Kiel 253/Background Scans/190204_10V.scn' with '.scn' r...
#> Warning: file creation date could not be determined on this operating syste...
#> Warning: caught error - 'names' attribute [10] must be the same length as t...
#> Info: reading file 'Kiel 253/Background Scans/190702_20V.scn' with '.scn' r...
#> Warning: file creation date could not be determined on this operating syste...
#> Warning: caught error - 'names' attribute [10] must be the same length as t...
#> Info: reading file 'Kiel 253/Background Scans/190715_5V.scn' with '.scn' re...
#> Warning: file creation date could not be determined on this operating syste...
#> Warning: caught error - 'names' attribute [10] must be the same length as t...
#> Info: finished reading 12 files in 3.90 secs
#> Info: encountered 18 problems in total
#> # A tibble: 18 x 4
#>    file_id        type   func            details                                
#>    <chr>          <chr>  <chr>           <chr>                                  
#>  1 13June2018_BG… warni… get_from_mdate  "file creation date could not be deter…
#>  2 23August2018_… warni… get_from_mdate  "file creation date could not be deter…
#>  3 22June2018_BG… warni… get_from_mdate  "file creation date could not be deter…
#>  4 190807_BG15V.… warni… get_from_mdate  "file creation date could not be deter…
#>  5 190926_BG15V.… warni… get_from_mdate  "file creation date could not be deter…
#>  6 190207_BG20V.… warni… get_from_mdate  "file creation date could not be deter…
#>  7 160504_5V.scn  warni… get_from_mdate  "file creation date could not be deter…
#>  8 160504_5V.scn  error  extract_scn_ra… "cannot identify scan units - could no…
#>  9 170524_10V.scn warni… get_from_mdate  "file creation date could not be deter…
#> 10 170524_10V.scn error  extract_scn_ra… "cannot identify scan units - could no…
#> 11 151210_10V.scn warni… get_from_mdate  "file creation date could not be deter…
#> 12 151210_10V.scn error  extract_scn_ra… "cannot identify scan units - could no…
#> 13 190204_10V.scn warni… get_from_mdate  "file creation date could not be deter…
#> 14 190204_10V.scn error  extract_scn_ra… "'names' attribute [10] must be the sa…
#> 15 190702_20V.scn warni… get_from_mdate  "file creation date could not be deter…
#> 16 190702_20V.scn error  extract_scn_ra… "'names' attribute [10] must be the sa…
#> 17 190715_5V.scn  warni… get_from_mdate  "file creation date could not be deter…
#> 18 190715_5V.scn  error  extract_scn_ra… "'names' attribute [10] must be the sa…

# file info
our_isos %>% iso_get_file_info()
#> Info: aggregating file info from 12 data file(s)
#> # A tibble: 12 x 7
#>    file_id  file_root   file_path file_subpath file_datetime       type  comment
#>    <chr>    <chr>       <chr>     <chr>        <dttm>              <chr> <chr>  
#>  1 13June2… /run/user/… 253plusk… <NA>         2018-06-13 14:28:40 High… <NA>   
#>  2 23Augus… /run/user/… 253plusk… <NA>         2018-08-23 11:23:00 High… <NA>   
#>  3 22June2… /run/user/… 253plusk… <NA>         2018-06-22 18:05:06 High… <NA>   
#>  4 190807_… /run/user/… 253plusk… <NA>         2019-08-07 12:10:31 High… <NA>   
#>  5 190926_… /run/user/… 253plusk… <NA>         2019-09-26 10:09:15 High… <NA>   
#>  6 190207_… /run/user/… 253plusk… <NA>         2019-02-07 08:30:53 High… <NA>   
#>  7 160504_… /run/user/… Kiel 253… <NA>         2016-05-04 20:25:46 High… <NA>   
#>  8 170524_… /run/user/… Kiel 253… <NA>         2017-05-24 14:03:58 High… <NA>   
#>  9 151210_… /run/user/… Kiel 253… <NA>         2015-12-10 11:39:48 High… <NA>   
#> 10 190204_… /run/user/… Kiel 253… <NA>         2019-02-04 12:16:50 High… <NA>   
#> 11 190702_… /run/user/… Kiel 253… <NA>         2019-07-02 17:45:51 High… <NA>   
#> 12 190715_… /run/user/… Kiel 253… <NA>         2019-07-15 09:26:38 High… <NA>

# plotting (manual)
our_isos %>%
  iso_get_raw_data(include_file_info = type) %>%
  dplyr::mutate(panel = sprintf("%s [%s]", type, units)) %>%
  tidyr::pivot_longer(
           matches("v\\d+"),
           names_to = "mass",
           values_to = "value",
           values_drop_na = TRUE
         ) %>%
  ggplot2::ggplot() +
  ggplot2::aes(x, value, color = mass) +
  ggplot2::geom_line() +
  ggplot2::facet_wrap( ~ panel + file_id, scales = "free")
#> Info: aggregating raw data from 12 data file(s), including file info 'type'

Created on 2020-02-10 by the reprex package (v0.3.0)

So it looks like it can't read the scans on our older mass spec. I'll upload some files later today, but for now I have to go teach!

Cheers!

japhir commented 4 years ago

Here are the six example scan files that don't read in properly, so you can replicate and try to find the problem:

scan_problems_uu.zip

sebkopf commented 4 years ago

quick question, are you reading these on a linux box? Trying to figure out how to deal better with the file creation date. Basically on linux you can't always figure out the actual creation date, only the last modified date. Do you think it's worth the live warning or just skip it and keep it in the problems?

japhir commented 4 years ago

Yeah this is on my own linux laptop, connected to the backup drive via samba. I think the warnings are not very informative, maybe give a warning once, the first time? Something like: "Waring: on Linux, file_datetime cannot be accessed from file creation date time, using last modified time in stead"

sebkopf commented 4 years ago

Hi @japhir : try again with the newest devtools::install_github("isoverse/isoreader", ref = "sk_scn") (be warned though that this branch currently breaks isoprocessor because of its tidyselect 1.0.0 prereq).

Should now be able to read these problem files and also pull out the resistors.

japhir commented 4 years ago

Awesome! Seems to work on my example files! Will now run it on all files and see how it goes :). Thanks!

# since we know example code runs, just run it for the previously failing scans
## devtools::install_github("isoverse/isoreader", ref = "sk_scn")

# read example files
library(isoreader)
#> 
#> Attaching package: 'isoreader'
#> The following object is masked from 'package:stats':
#> 
#>     filter

# these are the 253 with a Kiel III
files_pacman_kielIII = list.files(path = "/run/user/1000/gvfs/smb-share:server=geodc01.ad.geo.uu.nl,share=gml/rawdata/Kiel 253/clumped/Scans",
                                  pattern = "\\.scn$", recursive = TRUE, full.names = TRUE)
# these are the 253 with a Kiel IV
files_pacman_kielIV = list.files(path = "/run/user/1000/gvfs/smb-share:server=geodc01.ad.geo.uu.nl,share=gml/rawdata/Kiel 253/Background Scans",
                                 pattern = "\\.scn$", recursive = TRUE, full.names = TRUE)

our_isos <- iso_read_scan(c(sample(files_pacman_kielIII, size = 6),
                            sample(files_pacman_kielIV, size = 6)))
#> Info: preparing to read 12 data files (all will be cached)...
#> Info: reading file 'clumped/Scans/BG_scans_old_Anne/CO2clumped peakshape 13...
#> Info: reading file 'clumped/Scans/150612_20V.scn' with '.scn' reader
#> Info: reading file 'clumped/Scans/BG_scans_old_Anne/161003_40extr_5V.scn' w...
#> Info: reading file 'clumped/Scans/BG_scans_old_Anne/170315_25V.scn' with '....
#> Info: reading file 'clumped/Scans/BG_scans_Anne/160421_40extract_15V.scn' w...
#> Info: reading file 'clumped/Scans/170202_10V.scn' with '.scn' reader
#> Info: reading file 'Background Scans/190121_5V.scn' with '.scn' reader
#> Info: reading file 'Background Scans/191018_13_6V.scn' with '.scn' reader
#> Info: reading file 'Background Scans/190204_25V.scn' with '.scn' reader
#> Info: reading file 'Background Scans/190909_10V.scn' with '.scn' reader
#> Info: reading file 'Background Scans/191202_10V.scn' with '.scn' reader
#> Info: reading file 'Background Scans/190909_25V.scn' with '.scn' reader
#> Info: finished reading 12 files in 2.78 secs
#> Info: encountered 12 problems in total
#> # A tibble: 12 x 4
#>    file_id                  type   func      details                            
#>    <chr>                    <chr>  <chr>     <chr>                              
#>  1 CO2clumped peakshape 13… warni… get_crea… file creation date cannot be acces…
#>  2 150612_20V.scn           warni… get_crea… file creation date cannot be acces…
#>  3 161003_40extr_5V.scn     warni… get_crea… file creation date cannot be acces…
#>  4 170315_25V.scn           warni… get_crea… file creation date cannot be acces…
#>  5 160421_40extract_15V.scn warni… get_crea… file creation date cannot be acces…
#>  6 170202_10V.scn           warni… get_crea… file creation date cannot be acces…
#>  7 190121_5V.scn            warni… get_crea… file creation date cannot be acces…
#>  8 191018_13_6V.scn         warni… get_crea… file creation date cannot be acces…
#>  9 190204_25V.scn           warni… get_crea… file creation date cannot be acces…
#> 10 190909_10V.scn           warni… get_crea… file creation date cannot be acces…
#> 11 191202_10V.scn           warni… get_crea… file creation date cannot be acces…
#> 12 190909_25V.scn           warni… get_crea… file creation date cannot be acces…

# file info
our_isos %>% iso_get_file_info()
#> Info: aggregating file info from 12 data file(s)
#> # A tibble: 12 x 7
#>    file_id  file_root  file_path  file_subpath file_datetime       type  comment
#>    <chr>    <chr>      <chr>      <chr>        <dttm>              <chr> <chr>  
#>  1 CO2clum… /run/user… clumped/S… <NA>         2013-07-22 12:05:38 High… <NA>   
#>  2 150612_… /run/user… clumped/S… <NA>         2017-06-12 15:07:18 High… <NA>   
#>  3 161003_… /run/user… clumped/S… <NA>         2016-10-03 12:47:44 High… <NA>   
#>  4 170315_… /run/user… clumped/S… <NA>         2017-03-15 13:20:06 High… <NA>   
#>  5 160421_… /run/user… clumped/S… <NA>         2016-04-21 20:17:56 High… <NA>   
#>  6 170202_… /run/user… clumped/S… <NA>         2017-02-02 11:46:13 High… <NA>   
#>  7 190121_… /run/user… Backgroun… <NA>         2019-01-21 11:21:40 High… <NA>   
#>  8 191018_… /run/user… Backgroun… <NA>         2019-10-18 16:18:50 High… <NA>   
#>  9 190204_… /run/user… Backgroun… <NA>         2019-02-04 12:05:34 High… <NA>   
#> 10 190909_… /run/user… Backgroun… <NA>         2019-09-09 09:10:51 High… <NA>   
#> 11 191202_… /run/user… Backgroun… <NA>         2019-12-02 08:05:59 High… <NA>   
#> 12 190909_… /run/user… Backgroun… <NA>         2019-09-09 08:57:39 High… <NA>

# plotting (manual)
our_isos %>%
  iso_get_raw_data(include_file_info = type) %>%
  dplyr::mutate(panel = sprintf("%s [%s]", type, units)) %>%
  tidyr::pivot_longer(
           matches("v\\d+"),
           names_to = "mass",
           values_to = "value",
           values_drop_na = TRUE
         ) %>%
  ggplot2::ggplot() +
  ggplot2::aes(x, value, color = mass) +
  ggplot2::geom_line() +
  ggplot2::facet_wrap( ~ panel + file_id, scales = "free")
#> Info: aggregating raw data from 12 data file(s), including file info 'type'

Created on 2020-02-11 by the reprex package (v0.3.0)

japhir commented 4 years ago

PS: what does the step column mean? It's an integer that starts at something like 61470 for most of our measurements, but doesn't seem to relate to the highvoltage very logically.

cubessil commented 4 years ago

it is probably the ADC reading for the magnet?

japhir commented 4 years ago

Update: managed to run the code on all our scans (!), it's much faster than raw measurement data. Great stuff. It manages to get all the relevant metadata correctly for all files, except for these weird scans from ages ago. In this case it creates resistor columns:

more_scan_problems_uu.zip

I don't know what the settings were for these scans, but this seems to be a bug? Not important for me because they're only eight files from ages ago, and I can easily filter them out based on their filename before getting out the raw data, but I thought I should report :)

japhir commented 4 years ago

it is probably the ADC reading for the magnet?

Cool, after some googling I learned something about Analogue-to-Digital Converters! (see this Mass Spectrometry book)

sebkopf commented 4 years ago

@japhir those are some funny masses, I think that might indeed have been a strange configuration. Typically the masses are just what is set in the configurator, same for the resistor values. Let me know if you find more that are problematic like this, otherwise I'll call this closed for now 👍

sebkopf commented 4 years ago

@japhir heads up that I will have to rename the units columns to x_units, otherwise it gets tricky with the raw data gathering procedure and would require inconsistent treatment of the dual inlet and continuous flow vs. scan. Sorry about that, I hope it doesn't break your pipelines! not many other people are using scan files yet but you might have to re-read them with read_cache = FALSE

japhir commented 4 years ago

Thanks for the heads up. Just re-read all the files and only had to tweak my code in two places, so that was fine :)