Scheme of r-hyperspec packages

GegznaV commented 4 years ago

For me, it is a bit unclear, where we are going to with this project and how it should look at the end of this summer. The vision in the form of a scheme/flowchart would be helpful. The scheme should contain the names of r-hyperspec family packages and other non-package repositories we are going to create and the dependencies between them. The scheme may change later.

And I'm preparing a draft scheme which could be a starting point for the discussion.

GegznaV commented 4 years ago

@r-hyperspec/r-hyperspec These schemes are for discussion on Wednesday's meeting. They show how I understand our vision of r-hyperspec family (two versions of the vision are presented). Please, study and prepare your suggestions.

UPDATE: I updated the schemes and moved them to the message below.

The graphs are implemented with GraphViz via DiagrammeR in RStudio. Sources: r-hyperspec-schemes-GraphViz.zip Unzip, open in RStudio and press "Preview" button. Modify and press "Preview" again.

eoduniyi commented 4 years ago

Great job @GegznaV

bryanhanson commented 4 years ago

A couple of comments.

First, and this applies globally to the project: the key goal for GSOC is to make things easier to maintain. The overall idea of breaking into pieces is a good one. At the same time, to me, another way to conceptualize making things easier to maintain is to make things simpler. I think we should aim for as few pieces as possible.
Version 1 is closer to how I understand what we were/are aiming for.
I don't recall discussion of hySpc.io previously, but I may have forgotten. If it is a wrapper for importing should it be positioned between hyperSpec and hySpc.read.*?
To date, the concept of hySpc.pkgs was to serve the data packages that are too large for CRAN. We could of course have it serve anything, but it would be simpler and certainly more standard if develop versions are installed from their respective repos via remotes::install_github.

GegznaV commented 4 years ago

I corrected some issues in the schemes.

Scheme 1

![image](https://user-images.githubusercontent.com/12725868/87557771-000c1100-c6c1-11ea-9bc0-16b9ce58a1ab.png)

Scheme 2

![image](https://user-images.githubusercontent.com/12725868/87558457-cdaee380-c6c1-11ea-83ff-aedd57f015c6.png)

Legend

**Legend** Font color: - Black: already implemented packages - Red: not implemented yet - Green: non-package repos Lines/Arrows: - red: automatic relationship via CI. - blue: package dependencies (e.g., via "imports") - dashed purple: package dependencies (e.g., via "suggests"; only if installed on the user's computer): a. destination package is used to load other installed packages), b. This destination package may also reexport functions from the other packages.

cbeleites commented 4 years ago

Here's my proposal, edited from @GegznaV's list on slack:

Main package

hyperSpec CRAN

Bridge packages

... connect hyperSpec with other packages where interaction does not work automatically. They can go on CRAN since we don't need huge test data sets.

hySpc.ggplot2 CRAN
hySpc.dplyr CRAN
hySpc.matrixStats CRAN (future)
hySpc.baseline CRAN (future)
hySpc.EMSC CRAN (future)

Data packages:

hySpc.chondro GH only

Helper/Utility packages:

hySpc.skeleton GH only/no actual publication as package
hySpc.testthat CRAN

Helper GH repos:

r-hyperspec.github.io/
hySpc.pkgs/

Input/Output packages:

This is where things are more complicated...

If we want to cut down dependencies (https://github.com/cbeleites/hyperSpec/issues/215), at least some file import packages should go by file format rather than manufacturer:

hySpc.read.mat for Matlab .mat based file formats (read.mat.Witec() and read.mat.Cytospec())
depends on R.matlab.
hySpc.read.spe (for Princeton Instruments/Winspec) depends on xml2.
hySpc.read.JCAMP.DX will rather be a bridge package to @bryanhanson 's readJDX package, so obviously depends on readJDX.

There are import filters that do not add dependencies for binary formats:

read.ENVI.*() -> hySpc.read.ENVI
read.spc.*() -> hySpc.read.spc

These two file formats are sufficiently widespread and well-known that I believe they should each go into its own package.

There are import filters for a large variety of ASCII/text based formats:

files that usually have .asc ending: read.asc.Andor(), read.asc.PerkinElmer()
files that usually have .txt ending: read_txt_Witec(), read_txt_Witec_TrueMatch(), read.txt.Horiba(), read.txt.Renishaw(), read.txt.Shimadzu()
the Witec multi-ASCII-file formats: read_dat_Witec(), read_txt_Witec_Graph()

Should these be bundled into, say, hySpc.read.txt?

Last but not least, there is a number of file formats where we have example data but no import functions yet. At least some of them will have their own dependencies.

Import of Shimadzu .spc files (https://github.com/cbeleites/hyperSpec/issues/102) will require OLE reading.
(Unfortunately Shimadzu uses a file ending here that coincides with the well-known and widely used Thermo Galactic .spc file format ending - but the formats are completely different)
.jaz is ASCII, I don't think we'll have dependencies here
.pz2 is ASCII, I don't think we'll have dependencies here (we have some import code here that would need polishing)
Diffrac .uxd: ASCII, I don't think we'll have dependencies here
Perkin Elmer .sp: binary. Not sure about inner structure or dependencies
Gasmet .spe: binary, not the same as Princeton Instruments/Winspec .spe. Not sure about inner structure or dependencies
Trivista .tvf: XML-based
Renishaw WiRE .wdf binary, probably doable without dependencies.
Witec .WIP: binary, not sure about dependencies. Witec refused to discuss their file format with me, but see e.g. Gwyddion
Bruker Opus .0, .1, ...: binary (we don't actually have example data, but I could easily obtain some). Bruker has let me have their file format whitepaper in the past, but not recently. I.e.

@bryanhanson, @GegznaV , @eoduniyi, @ximeg: What do you think:

It may be better to have the file import packages named consistently and have them all by file format name. This would mean that we drop hySpc.read.Witec (or rather, rename it into hySpc.read.txt). We have e.g. several manufacturers exporting in Thermo Galactic .spc format, and their files are slightly different so we have not only read.spc(), but also read.spc.KaiserMap() etc. Putting the latter into a package hySpc.read.Kaiser would have that package depending on hySpc.read.spc which I'd like to avoid.

ximeg commented 3 years ago

As long as the end user can easily install all r-hyperSpec packages and easily (automatically) load all of them, we can split the file format function between packages however we want. It is important to remove this burden from the end user. I like and support the idea to do the packaging based on the dependencies, trying to minimize them.

My point is that as an end user (data analyst/spectroscopist) I want to be able to

setup and update my working environment easily, something like install.packages('hyperSpec-EVERYTHING')
load all available functions into my environment without thinking about all these granularities, I'd prefere to just call library(hyperSpec) as opposed to

library(hySpc.ggplot2)
library(hySpc.chondro)
library(hySpc.matrixStats)
library(hySpc.baseline)
library(hySpc.read.ENVI)
library(hySpc.read.spc)
library(hySpc.read.txt)

# Now I can finally write a line of code that reads a file, subtracts a baseline, and makes a plot
...
???
# Wait, I forgot to load a package that provides the `filter()` function...
# What was its name? ... Google it... Ah, `dplyr`
library(hySpc.dplyr)
...
# works!

eoduniyi commented 3 years ago

vision-model

@GegznaV this is still useful:

via RGSOC_2020_Proposal

io

@cbeleites It sounds like the hySpc.read.Witec will be turned into hySpc.read.txt, which means this will be a larger package that supports import filters: Witec, Reinshaw, Andor, PerkinElmer, and Horiba. The remaining file io packages will support reading spectra data from: MATLAB, Winspec, Shimandzu?, and JCAMP.DX. Additionally, dedicated packages for ENVI and spc.

ux/ui

@ximeg I totally agree with you on this; I wonder about other ways to support the friendliness/experience for typical spectroscopic work.

maintainability

@bryanhanson I think the documentation on functionality and contributing/style has made it easier to maintain

eoduniyi commented 3 years ago

From the perspective of time I think we've gotten more specific about the implementation details: @cbeleites 2011 hyperSpec figure -> RGSOC_2020 figures -> @GegznaV figures

bryanhanson commented 3 years ago

Closing, as we have pretty much settled on a naming scheme and the issue is old.

r-hyperspec / r-hyperspec.github.io