Closed GegznaV closed 3 years ago
@r-hyperspec/r-hyperspec These schemes are for discussion on Wednesday's meeting. They show how I understand our vision of r-hyperspec family (two versions of the vision are presented). Please, study and prepare your suggestions.
UPDATE: I updated the schemes and moved them to the message below.
The graphs are implemented with GraphViz via DiagrammeR in RStudio. Sources: r-hyperspec-schemes-GraphViz.zip Unzip, open in RStudio and press "Preview" button. Modify and press "Preview" again.
Great job @GegznaV
A couple of comments.
hySpc.io
previously, but I may have forgotten. If it is a wrapper for importing should it be positioned between hyperSpec
and hySpc.read.*
? hySpc.pkgs
was to serve the data packages that are too large for CRAN. We could of course have it serve anything, but it would be simpler and certainly more standard if develop
versions are installed from their respective repos via remotes::install_github
.I corrected some issues in the schemes.
Here's my proposal, edited from @GegznaV's list on slack:
... connect hyperSpec with other packages where interaction does not work automatically. They can go on CRAN since we don't need huge test data sets.
r-hyperspec.github.io/
hySpc.pkgs/
This is where things are more complicated...
If we want to cut down dependencies (https://github.com/cbeleites/hyperSpec/issues/215), at least some file import packages should go by file format rather than manufacturer:
.mat
based file formats (read.mat.Witec() and read.mat.Cytospec()
)There are import filters that do not add dependencies for binary formats:
read.ENVI.*()
-> hySpc.read.ENVI read.spc.*()
-> hySpc.read.spcThese two file formats are sufficiently widespread and well-known that I believe they should each go into its own package.
There are import filters for a large variety of ASCII/text based formats:
.asc
ending: read.asc.Andor()
, read.asc.PerkinElmer()
.txt
ending: read_txt_Witec()
, read_txt_Witec_TrueMatch()
, read.txt.Horiba()
, read.txt.Renishaw()
, read.txt.Shimadzu()
read_dat_Witec()
, read_txt_Witec_Graph()
Should these be bundled into, say, hySpc.read.txt?
Last but not least, there is a number of file formats where we have example data but no import functions yet. At least some of them will have their own dependencies.
Import of Shimadzu .spc
files (https://github.com/cbeleites/hyperSpec/issues/102) will require OLE reading.
(Unfortunately Shimadzu uses a file ending here that coincides with the well-known and widely used Thermo Galactic .spc file format ending - but the formats are completely different)
.jaz
is ASCII, I don't think we'll have dependencies here
.pz2
is ASCII, I don't think we'll have dependencies here (we have some import code here that would need polishing)
Diffrac .uxd
: ASCII, I don't think we'll have dependencies here
Perkin Elmer .sp
: binary. Not sure about inner structure or dependencies
Gasmet .spe
: binary, not the same as Princeton Instruments/Winspec .spe. Not sure about inner structure or dependencies
Trivista .tvf
: XML-based
Renishaw WiRE .wdf
binary, probably doable without dependencies.
Witec .WIP
: binary, not sure about dependencies. Witec refused to discuss their file format with me, but see e.g. Gwyddion
Bruker Opus .0
, .1
, ...: binary (we don't actually have example data, but I could easily obtain some). Bruker has let me have their file format whitepaper in the past, but not recently. I.e.
@bryanhanson, @GegznaV , @eoduniyi, @ximeg: What do you think:
It may be better to have the file import packages named consistently and have them all by file format name.
This would mean that we drop hySpc.read.Witec (or rather, rename it into hySpc.read.txt). We have e.g. several manufacturers exporting in Thermo Galactic .spc format, and their files are slightly different so we have not only read.spc()
, but also read.spc.KaiserMap()
etc. Putting the latter into a package hySpc.read.Kaiser would have that package depending on hySpc.read.spc which I'd like to avoid.
As long as the end user can easily install all r-hyperSpec
packages and easily (automatically) load all of them, we can split the file format function between packages however we want. It is important to remove this burden from the end user. I like and support the idea to do the packaging based on the dependencies, trying to minimize them.
My point is that as an end user (data analyst/spectroscopist) I want to be able to
install.packages('hyperSpec-EVERYTHING')
library(hyperSpec)
as opposed tolibrary(hySpc.ggplot2)
library(hySpc.chondro)
library(hySpc.matrixStats)
library(hySpc.baseline)
library(hySpc.read.ENVI)
library(hySpc.read.spc)
library(hySpc.read.txt)
# Now I can finally write a line of code that reads a file, subtracts a baseline, and makes a plot
...
???
# Wait, I forgot to load a package that provides the `filter()` function...
# What was its name? ... Google it... Ah, `dplyr`
library(hySpc.dplyr)
...
# works!
@GegznaV this is still useful:
@cbeleites It sounds like the hySpc.read.Witec
will be turned into hySpc.read.txt
, which means this will be a larger package that supports import filters: Witec
, Reinshaw
, Andor
, PerkinElmer
, and Horiba
. The remaining file io packages will support reading spectra data from: MATLAB
, Winspec
, Shimandzu
?, and JCAMP.DX.
Additionally, dedicated packages for ENVI
and spc
.
@ximeg I totally agree with you on this; I wonder about other ways to support the friendliness/experience for typical spectroscopic work.
@bryanhanson I think the documentation on functionality and contributing/style has made it easier to maintain
From the perspective of time I think we've gotten more specific about the implementation details: @cbeleites 2011 hyperSpec figure -> RGSOC_2020 figures -> @GegznaV figures
Closing, as we have pretty much settled on a naming scheme and the issue is old.
For me, it is a bit unclear, where we are going to with this project and how it should look at the end of this summer. The vision in the form of a scheme/flowchart would be helpful. The scheme should contain the names of
r-hyperspec
family packages and other non-package repositories we are going to create and the dependencies between them. The scheme may change later.And I'm preparing a draft scheme which could be a starting point for the discussion.