hoijui commented 1 year ago

Type of files

Two specific types I am thinking of:

gathered data from:
- measurements
- simulations which are too costly to run on the fly or (often) in CI
- interviews
- survey
manually assembled data, like:
- a list of file extensions, with info denoting whether they are text or binary formats
- A CSV table describing a standard, like the ones in the repo

Storage locations - Ideation

A separate data folder
- data/measurement/forcesX.csv
- data/simulation/stress1.csv
- data/simulation/stress2.csv
- data/survey1.csv
A sub-folder under res
- res/data/measurement/forcesX.csv
- res/data/simulation/stress1.csv
- res/data/simulation/stress2.csv
- res/data/survey1.csv

General or specific folder name(s)

data is of course a very general term, that strictly speaking, would apply to almost anything in a repo anyway. something like sheet or tabular on the other hand, seems too specific and overly-focused on the format, which is assumed to be 2D table, while we might also want to include more (or less) dimensions then 2.

hoijui commented 1 year ago

In an other practical example, I have slightly different data:

I wrote a script, that takes a git repo web URL (e.g. https://github.com/hoijui/osh-dir-std/), and by looking at that pages HTML source, decides whether the repo is public or not. To come up with the code, I had to do some "research", going to different git repo hosting sites, and looking at the HTML source for their repos, both public and non-public (e.g. private) ones. I then c&p out relevant parts, and collected them in a Markdown file, or say, two: public.md and private.md \ Where to these belong?

src/scraped/
doc/scraped/
res/data/scraped/
data/scraped/
...

timmwille commented 1 year ago

I think it is a very relevant question to answer, maybe it helps to check again what higher level structure we have: https://github.com/hoijui/osh-dir-std/blob/main/mod/unixish/definition.csv

Let me collect my thoughts, just a sec

PS: I don't fully understand your "scraped" use case yet, but will come back to that too

timmwille commented 1 year ago

lets see where the datasets can go

So apart from Licenses and mods we have:

(@hoijui consider organizing the definition.csv alphabetically, really would help)

doc/
gen/
run/
res/
src/

existing options

let's go through one by one to clarify where datasets would go

doc/ : NO → this is where we want to put explanatory documentation that embeds from res/ (though I don't fully understand the difference between res/media/ and res/assets/media/
gen/ : NO → only generated files/outputs go here
run/ : NO → only for automation, helping build and keep the repo organized (to my understanding so far)
res/ : MAYBE → if the data is not SOURCE data that is constantly improved and worked with and used across doc/ and src/ equally as we always want "single source of truth" it makes sense → I'll write some examples in a bit
src/ : MAYBE → all Files that are part of the true "Source" of the project should sit here (no binaries!, no explanatory data apart from #comments in the code), the first place to look, that is where the CAB Review according to DIN SPEC 3105 will look (apart from the docs to go through to help with understanding)!

what about new directories?

I see only three options here:

data/ → very generic, but would cover a lot (not only a good thing)
datasets/ → very clear, might be a bit long as a name
records/ → a bit more open then datasets/, all data records would go here, even scraped data

Pro/Con and resulting open questions:

Is data/ or one of the other (datasets/ records/) a new main directory or part of the other?
Is records clear enough to not confuse with generated?
How to differentiate collected data from externally generated to internally generated data that sits in gen?

I'll evaluate this now

timmwille commented 1 year ago

Basically that means we're discussing:

Where to put it?
- res/
- src/
- <new>/

and

How to name it?:
- data/
- datasets/
- records/

hoijui commented 1 year ago

other possibly useful words:

gather
collect
recordings
collections

I like records a lot though! It fits well for tabular data, for whatever dimensionality. a issue with it is: it describes the data-format, while (most) other dir names describe the data (content). for example, we have a directory called doc/; it is not called text/. then again, src/ is kind of in both categories.

timmwille commented 1 year ago

Ok I suggest:

res/datasets/ : for scraped datasets and other data that is just there as a resource for other parts of the documentation and references*
src/records/ : for all source related work data that is complied manually or via external sources to help with development

this would also help (at least me) to better understand: res/media/ and res/datasets/ as resources in source format whilst every binary resources sit under res/assets/ :bulb:

* I think maybe even Survey data should go there? What about TSdCs related Technical specs of the overall Machine or external parts/modules that are proprietary?

Final thought

in case (for a reason I can only estimate slightly right now) we only talk about resources and not at all about source of the project

Example A

I want to collect data from a machine to evaluate the precision and have this as reference data in my repository, so what would I do?

I would write a script-a in src/software/ with a src/calc/ logic file (isn't that also a software kind of?) behind and some output generated through a simulation src/sim/ using that calculation as well.
I would want to send this simulation output to ...? \ → would this go to dataset/records too? or is this a gen/sim/ output?
now I take src/software/script-a to run the test with the machine by talking through an API of a src/firmware/ and collect the data records in ...? \ → would this go to datasets/records too? or is this a src/test/ source now?
This data now counts as my real life reference for further src/sim/ simulation runs to improve the src/mech/ and src/elec/ design (maybe even to improve the script, the software or firmware as well).

Example B

I want to create a reference data sheet for measurements out of a 3D analysis of a physical object, from there I'll generate a parametric design, what would I do?

Example C

I want to scrape metadata from other similar hardware projects as a reference for my calculations, design and compare with my own metadata/specs even for documentation purposes, what would I do?

Example D

I want to create a realistic image of my wind turbine rotor blade design, by using data-points from an external Airfoil generator software, what would I do?

[Concept Design step] I would go to the generator, input my preset rotor blade metadata from ...? \ → would this sit in datasets/records? or in gen/calc/ as it was calculated based on power/wind/size, so other machine config metadata?
[Mech Design step] I would take that data-points from the generator for a specific 2D profile and with some help of a src/calc/ mathematical logic file (might also be embedded in the CAD program I'm using) and crate a nice 3D CAD Model
[Simulation Design step] Then I import that CAD model in src/mech to a create a src/sim simulation, improve the design a bit and send it to src/anim/ for creating a photo-realistic image that will be send to ...? \ → is this then to go to gen/anim/ or is this image a file that will sit under res/assets/media/img/?

as reference I used this tree view:

run/
res/
res/conf/
res/media/
res/media/img/
res/assets/
res/assets/media/
res/assets/media/img/
res/assets/media/vid/
res/assets/var/
src/
src/anim/
src/calc/
src/sim/
src/elec/
src/firmware/
src/mech/
src/software/
src/test/
gen/
gen/site/
gen/anim/
gen/calc/
gen/sim/
gen/software/
gen/firmware/
gen/elec/
gen/mech/
gen/doc/
gen/doc/assembly/
gen/doc/manuf/
gen/doc/usr/
gen/doc/recycling/
doc/
doc/assembly/
doc/manuf/
doc/usr/

timmwille commented 1 year ago

Here also #8 for easier communication

hoijui commented 1 year ago

I figured, file is actually a very good fit according to its definition:

a folder, cabinet, or other container in which papers, letters, etc., are arranged in convenient order for storage or reference.
a collection of papers, records, etc., arranged in convenient order: to make a file for a new account.

would it really be an option though? :/

src/files/bla.csv

... too general, right?

hoijui commented 1 year ago

other options:

timmwille commented 1 year ago

Hey sorry I totally missed this but I like src/input/ actually very much, it indicates source files that are simply input for other design files/processes and might come from external/physical sources/measurments. It is then also not limited to datasets or records but could also be something else.

src/files/ is too generic!! So go with src/input/

hoijui / osh-dir-std

Where to put different (tabular) data files? #6

Type of files

Storage locations - Ideation

General or specific folder name(s)

lets see where the datasets can go

existing options

what about new directories?

Final thought

Example A

Example B

Example C

Example D