Open hoijui opened 1 year ago
In an other practical example, I have slightly different data:
I wrote a script, that takes a git repo web URL (e.g. https://github.com/hoijui/osh-dir-std/
), and by looking at that pages HTML source, decides whether the repo is public or not.
To come up with the code, I had to do some "research", going to different git repo hosting sites, and looking at the HTML source for their repos, both public and non-public (e.g. private) ones.
I then c&p out relevant parts, and collected them in a Markdown file, or say, two: public.md
and private.md
\
Where to these belong?
src/scraped/
doc/scraped/
res/data/scraped/
data/scraped/
I think it is a very relevant question to answer, maybe it helps to check again what higher level structure we have: https://github.com/hoijui/osh-dir-std/blob/main/mod/unixish/definition.csv
Let me collect my thoughts, just a sec
PS: I don't fully understand your "scraped" use case yet, but will come back to that too
So apart from Licenses and mods we have:
(@hoijui consider organizing the definition.csv alphabetically, really would help)
doc/
gen/
run/
res/
src/
let's go through one by one to clarify where datasets would go
doc/
: NO → this is where we want to put explanatory documentation that embeds from res/
(though I don't fully understand the difference between res/media/
and res/assets/media/
gen/
: NO → only generated files/outputs go hererun/
: NO → only for automation, helping build and keep the repo organized (to my understanding so far)res/
: MAYBE → if the data is not SOURCE data that is constantly improved and worked with and used across doc/
and src/
equally as we always want "single source of truth" it makes sense → I'll write some examples in a bitsrc/
: MAYBE → all Files that are part of the true "Source" of the project should sit here (no binaries!, no explanatory data apart from #comments in the code
), the first place to look, that is where the CAB Review according to DIN SPEC 3105 will look (apart from the docs to go through to help with understanding)!I see only three options here:
data/
→ very generic, but would cover a lot (not only a good thing)datasets/
→ very clear, might be a bit long as a namerecords/
→ a bit more open then datasets/
, all data records would go here, even scraped data Pro/Con and resulting open questions:
data/
or one of the other (datasets/
records/
) a new main directory or part of the other?records
clear enough to not confuse with generated
?gen
?I'll evaluate this now
Basically that means we're discussing:
res/
src/
<new>/
and
data/
datasets/
records/
other possibly useful words:
I like records
a lot though!
It fits well for tabular data, for whatever dimensionality.
a issue with it is:
it describes the data-format, while (most) other dir names describe the data (content). for example, we have a directory called doc/
; it is not called text/
. then again, src/
is kind of in both categories.
Ok I suggest:
res/datasets/
: for scraped datasets and other data that is just there as a resource for other parts of the documentation and references*src/records/
: for all source related work data that is complied manually or via external sources to help with developmentthis would also help (at least me) to better understand:
res/media/
andres/datasets/
as resources in source format whilst every binary resources sit underres/assets/
:bulb:
* I think maybe even Survey data should go there? What about TSdCs related Technical specs of the overall Machine or external parts/modules that are proprietary?
I want to collect data from a machine to evaluate the precision and have this as reference data in my repository, so what would I do?
script-a
in src/software/
with a src/calc/
logic file (isn't that also a software kind of?)
behind and some output generated through a simulation src/sim/
using that calculation as well.gen/sim/
output?src/software/script-a
to run the test with the machine by talking through an API of a src/firmware/
and collect the data records in ...? \
→ would this go to datasets/records too? or is this a src/test/
source now?src/sim/
simulation runs to improve the src/mech/
and src/elec/
design (maybe even to improve the script, the software or firmware as well).I want to create a reference data sheet for measurements out of a 3D analysis of a physical object, from there I'll generate a parametric design, what would I do?
I want to scrape metadata from other similar hardware projects as a reference for my calculations, design and compare with my own metadata/specs even for documentation purposes, what would I do?
I want to create a realistic image of my wind turbine rotor blade design, by using data-points from an external Airfoil generator software, what would I do?
gen/calc/
as it was calculated based on power/wind/size,
so other machine config metadata?src/calc/
mathematical logic file
(might also be embedded in the CAD program I'm using)
and crate a nice 3D CAD Modelsrc/mech
to a create a src/sim
simulation,
improve the design a bit and send it to src/anim/
for creating a photo-realistic image that will be send to ...? \
→ is this then to go to gen/anim/
or is this image a file that will sit under res/assets/media/img/
?as reference I used this tree view:
run/
res/
res/conf/
res/media/
res/media/img/
res/assets/
res/assets/media/
res/assets/media/img/
res/assets/media/vid/
res/assets/var/
src/
src/anim/
src/calc/
src/sim/
src/elec/
src/firmware/
src/mech/
src/software/
src/test/
gen/
gen/site/
gen/anim/
gen/calc/
gen/sim/
gen/software/
gen/firmware/
gen/elec/
gen/mech/
gen/doc/
gen/doc/assembly/
gen/doc/manuf/
gen/doc/usr/
gen/doc/recycling/
doc/
doc/assembly/
doc/manuf/
doc/usr/
Here also #8 for easier communication
I figured, file is actually a very good fit according to its definition:
would it really be an option though? :/
src/files/bla.csv
... too general, right?
Hey sorry I totally missed this but I like src/input/
actually very much, it indicates source files that are simply input for other design files/processes and might come from external/physical sources/measurments. It is then also not limited to datasets or records but could also be something else.
src/files/
is too generic!! So go with src/input/
Type of files
Two specific types I am thinking of:
Storage locations - Ideation
data
folderdata/measurement/forcesX.csv
data/simulation/stress1.csv
data/simulation/stress2.csv
data/survey1.csv
res
res/data/measurement/forcesX.csv
res/data/simulation/stress1.csv
res/data/simulation/stress2.csv
res/data/survey1.csv
General or specific folder name(s)
data
is of course a very general term, that strictly speaking, would apply to almost anything in a repo anyway. something likesheet
ortabular
on the other hand, seems too specific and overly-focused on the format, which is assumed to be 2D table, while we might also want to include more (or less) dimensions then 2.