nfdi4plants / ARCTokenization

Definition of controlled vocabulary tokens and library to tokenize ARC metadata into these tokens
https://nfdi4plants.github.io/ARCTokenization/
MIT License
3 stars 2 forks source link

`Study.parseMetadataSheetfromFile` does not parse metadata sheet: says worksheet is not present #35

Closed omaus closed 10 months ago

omaus commented 1 year ago
System.Exception: No worksheet named 'Study' or 'isa_study' found in the workbook
   at ARCTokenization.Workbook.getStudyMetadataSheet(Boolean useLastSheetOnIncorrectName, FsWorkbook study)
   at ARCTokenization.Study.parseMetadataRowsFromFile(String path, FSharpOption`1 UseLastSheetOnIncorrectName)
   at ARCTokenization.Study.parseMetadataSheetfromFile(String path, FSharpOption`1 UseLastSheetOnIncorrectName)
   at ArcValidation.ArcGraph.fromXlsxFile(Dictionary`2 onto, FSharpFunc`2 xlsxParsing, String xlsxPath) in C:\Repos\nfdi4plants\arc-validate\src\ARCValidation\ArcGraph.fs:line 229
   at <StartupCode$FSI_0010>.$FSI_0010.main@() in C:\Repos\nfdi4plants\arc-validate\prototype.fsx:line 194
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Stopped due to error

On the attached file, Study.parseMetadataSheetfromFile does not work although a worksheet with the name is given.

isa.study.xlsx

(File taken from https://git.nfdi4plants.org/muehlhaus/ArcPrototype)

kMutagene commented 1 year ago

There is no code executed from this library to get the worksheet (also true for #36 ) - it is just FsWorkbook.tryGetWorksheetByName and getWorksheetByName guarded with a try ... with. So this looks like a FsSpreadsheet issue to me.

https://github.com/nfdi4plants/ARCTokenization/blob/4c93bfc64644494e2fdd62f3d1e2c6bfabbe6410/src/ARCTokenization/Workbook.fs#L20-L40

We should definitely add integration tests for these functions though.

kMutagene commented 1 year ago

On another note, possible workheet names should all be defined by structural ontology

omaus commented 1 year ago

Hm...

Try this.

#r "nuget: FsSpreadsheet"
#r "nuget: FsSpreadsheet.ExcelIO"

open FsSpreadsheet
open FsSpreadsheet.ExcelIO

let wb = FsWorkbook.fromXlsxFile @"C:\Users\<you>\Downloads\isa.study.xlsx"

FsWorkbook.tryGetWorksheetByName "Study" wb 

Le output:

val it: FsWorksheet option =
  Some
    FsSpreadsheet.FsWorksheet
      {CellCollection = FsSpreadsheet.FsCellsCollection;
       Columns = seq
                   [seq
                      [A1 : STUDY METADATA | String;
                       A2 : Study Identifier | String;
                       A3 : Study Title | String;
                       A4 : Study Description | String; ...];
                    seq
                      [B2 : experiment1_material | String;
                       B3 : Prototype for experimental data | String;
                       B4 : In this a devised study to have an exemplary experimental material description. | String;
                       B5 :  | String; ...]];
       Name = "Study";
       Rows = seq
                [seq [A1 : STUDY METADATA | String];
                 seq
                   [A2 : Study Identifier | String;
                    B2 : experiment1_material | String];
                 seq
                   [A3 : Study Title | String;
                    B3 : Prototype for experimental data | String];
                 seq
                   [A4 : Study Description | String;
                    B4 : In this a devised study to have an exemplary experimental material description. | String];
                 ...];
       Tables = seq [];}

Then try this:

FsWorkbook.tryGetWorksheetByName "Study" wb 
|> Option.defaultValue (FsWorkbook.getWorksheetByName "isa_study" wb)

outputs:

> FsWorkbook.tryGetWorksheetByName "Study" wb 
- |> Option.defaultValue (FsWorkbook.getWorksheetByName "isa_study" wb);;
System.Exception: FsWorksheet with name isa_study is not present in the FsWorkbook.
   at FsSpreadsheet.FsWorkbook.GetWorksheetByName(String sheetName)
   at <StartupCode$FSI_0012>.$FSI_0012.main@() in C:\Users\revil\Untitled-1:line 13
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Stopped due to error

The fuck is this?! 😳

omaus commented 1 year ago

Ahh, nvm.

It tries to evaluate FsWorkbook.getWorksheetByName "isa_study" wb, no matter if FsWorkbook.tryGetWorksheetByName "Study" wb is Some or None.

So, care!: Option.defaultValue is not like && or ||. It doesn't stop evaluating just because it's Some.