hoijui / osh-dir-std

Open Source Hardware directory standard(s)
https://gitlab.fabcity.hamburg/software/template-osh-repo-structure-minimal/
Other
0 stars 2 forks source link

Where to put different (tabular) data files? #6

Open hoijui opened 1 year ago

hoijui commented 1 year ago

Type of files

Two specific types I am thinking of:

Storage locations - Ideation

  1. A separate data folder
    • data/measurement/forcesX.csv
    • data/simulation/stress1.csv
    • data/simulation/stress2.csv
    • data/survey1.csv
  2. A sub-folder under res
    • res/data/measurement/forcesX.csv
    • res/data/simulation/stress1.csv
    • res/data/simulation/stress2.csv
    • res/data/survey1.csv

General or specific folder name(s)

data is of course a very general term, that strictly speaking, would apply to almost anything in a repo anyway. something like sheet or tabular on the other hand, seems too specific and overly-focused on the format, which is assumed to be 2D table, while we might also want to include more (or less) dimensions then 2.

hoijui commented 1 year ago

In an other practical example, I have slightly different data:

I wrote a script, that takes a git repo web URL (e.g. https://github.com/hoijui/osh-dir-std/), and by looking at that pages HTML source, decides whether the repo is public or not. To come up with the code, I had to do some "research", going to different git repo hosting sites, and looking at the HTML source for their repos, both public and non-public (e.g. private) ones. I then c&p out relevant parts, and collected them in a Markdown file, or say, two: public.md and private.md \ Where to these belong?

timmwille commented 1 year ago

I think it is a very relevant question to answer, maybe it helps to check again what higher level structure we have: https://github.com/hoijui/osh-dir-std/blob/main/mod/unixish/definition.csv

Let me collect my thoughts, just a sec

PS: I don't fully understand your "scraped" use case yet, but will come back to that too

timmwille commented 1 year ago

lets see where the datasets can go

So apart from Licenses and mods we have:

(@hoijui consider organizing the definition.csv alphabetically, really would help)

doc/
gen/
run/
res/
src/

existing options

let's go through one by one to clarify where datasets would go

what about new directories?

I see only three options here:

Pro/Con and resulting open questions:

I'll evaluate this now

timmwille commented 1 year ago

Basically that means we're discussing:

  1. Where to put it?
    • res/
    • src/
    • <new>/

and

  1. How to name it?:
    • data/
    • datasets/
    • records/
hoijui commented 1 year ago

other possibly useful words:

I like records a lot though! It fits well for tabular data, for whatever dimensionality. a issue with it is: it describes the data-format, while (most) other dir names describe the data (content). for example, we have a directory called doc/; it is not called text/. then again, src/ is kind of in both categories.

timmwille commented 1 year ago

Ok I suggest:

this would also help (at least me) to better understand: res/media/ and res/datasets/ as resources in source format whilst every binary resources sit under res/assets/ :bulb:

* I think maybe even Survey data should go there? What about TSdCs related Technical specs of the overall Machine or external parts/modules that are proprietary?


Final thought

Example A

I want to collect data from a machine to evaluate the precision and have this as reference data in my repository, so what would I do?

Example B

I want to create a reference data sheet for measurements out of a 3D analysis of a physical object, from there I'll generate a parametric design, what would I do?

Example C

I want to scrape metadata from other similar hardware projects as a reference for my calculations, design and compare with my own metadata/specs even for documentation purposes, what would I do?

Example D

I want to create a realistic image of my wind turbine rotor blade design, by using data-points from an external Airfoil generator software, what would I do?


as reference I used this tree view:

run/
res/
res/conf/
res/media/
res/media/img/
res/assets/
res/assets/media/
res/assets/media/img/
res/assets/media/vid/
res/assets/var/
src/
src/anim/
src/calc/
src/sim/
src/elec/
src/firmware/
src/mech/
src/software/
src/test/
gen/
gen/site/
gen/anim/
gen/calc/
gen/sim/
gen/software/
gen/firmware/
gen/elec/
gen/mech/
gen/doc/
gen/doc/assembly/
gen/doc/manuf/
gen/doc/usr/
gen/doc/recycling/
doc/
doc/assembly/
doc/manuf/
doc/usr/
timmwille commented 1 year ago

Here also #8 for easier communication

hoijui commented 1 year ago

I figured, file is actually a very good fit according to its definition:

  1. a folder, cabinet, or other container in which papers, letters, etc., are arranged in convenient order for storage or reference.
  2. a collection of papers, records, etc., arranged in convenient order: to make a file for a new account.

would it really be an option though? :/

src/files/bla.csv

... too general, right?

hoijui commented 1 year ago

other options:

timmwille commented 1 year ago

Hey sorry I totally missed this but I like src/input/ actually very much, it indicates source files that are simply input for other design files/processes and might come from external/physical sources/measurments. It is then also not limited to datasets or records but could also be something else.

src/files/ is too generic!! So go with src/input/